Bioinformatics

Abstract

Molecular biologists generate data at an unparalleled pace, and the demands and opportunities for interpreting such data are expanding more than ever. In this context, bioinformatics has emerged as a strategic frontier involving molecular biology, molecular evolution, information technology and statistics. Bioinformatics may be defined as the research, development or application of computational tools and approaches for expanding the use of biological data, including those to acquire, store, organise, archive, analyse or visualise such data. Its goal is to enable biological discovery based on existing data, or in other words to transform biological data into information, and eventually into knowledge.

Key Concepts

  • Biological data grow faster than Moore's law predicts, which means that information technology might be a bottleneck of the data analyses.
  • The amount of data produced by biologists triggered a boom of specialised databases.
  • Nucleotide sequences are read in small stretches that need to be assembled computationally into whole molecules.
  • Searching sequence databases with the unknown sequence as a query, the so‚Äźcalled similarity search, is most likely the most frequent bioinformatic activity.
  • Translational bioinformatics is a key for successful precision medicine.

Keywords: computational biology; biological database; sequence analysis; protein analysis; sequence assembly; gene prediction; similarity search; protein structure prediction; systems biology; phylogenetic analysis

Figure 1. Growth of biological information measured by the number of nucleotides deposited in NCBI's GenBank and the number of nucleotides deposited by large sequencing projects (Whole Genome Shotgun, WGS).
Figure 2. Bioinformatics is a research program that emerged at the crossroad of other scientific disciplines.
Figure 3. Word clouds (https://www.wordclouds.com) generated from the abstracts of papers with assigned either ‘bioinformatics’ keyword (a and b) or ‘computational biology’ keyword (c and d). All the data were retrieved from PubMed (http://www.ncbi.nlm.nih.gov/pubmed/). In short, the size of any word in the cloud is proportional to the occurrence frequency of a given word in the data set. These graphs demonstrate evolution of bioinformatics field as seen by topics that the community was focused on. Comparison of (a) with (c) and (b) with (d) demonstrates that bioinformatics and computational biology are indistinguishable by the topics that they are working on.
close

References

Altschul SF, Gish W, Miller W, et al. (1990) Basic local alignment search tool. Journal of Molecular Biology 215: 403–410.

Benson DA, Cavanaugh M, Clark K, et al. (2018) GenBank. Nucleic Acids Research 46: D41–D47.

Brew K, Vanaman TC and Hill RL (1967) Comparison of the amino acid sequence of bovine alpha‐lactalbumin and hens egg white lysozyme. Journal of Biological Chemistry 242: 3747–3749.

Graur D, Zheng Y, Price N, et al. (2013) On the immortality of television sets: “function” in the human genome according to the evolution‐free gospel of ENCODE. Genome Biology and Evolution 5: 578–590.

Graur D (2017) An upper limit on the functional fraction of the human genome. Genome Biology and Evolution 9: 1880–1885.

Hesper B and Hogeweg P (1970) Bioinformatica: een werkconcept. Kameleon 1: 28–29.

Kim DE, Chivian D and Baker D (2004) Protein structure prediction and analysis using the Robetta server. Nucleic Acids Research 32: W526–W531.

Kodama Y, Mashima J, Kosuge T, et al. (2018) DNA Data Bank of Japan: 30th anniversary. Nucleic Acids Research 46: D30–D35.

Kumar S, Stecher G and Tamura K (2016) MEGA7: Molecular Evolutionary Genetics Analysis Version 7.0 for bigger datasets. Molecular Biology and Evolution 33: 1870–1874.

Lander ES, Linton LM, Birren B, et al. (2001) Initial sequencing and analysis of the human genome. Nature 409: 860–921.

Lowe TM and Eddy SR (1997) tRNAscan‐SE: a program for improved detection of transfer RNA genes in genomic sequence. Nucleic Acids Research 25: 955–964.

Lowe EK, Cuomo C and Arnone MI (2017) Omics approaches to study gene regulatory networks for development in echinoderms. Briefings in Functional Genomics 16: 299–308.

Makalowski W (2001) The human genome structure and organization. Acta Biochimica Polonica 48: 587–598.

Metzker ML, Mindell DP, Liu XM, et al. (2002) Molecular evidence of HIV‐1 transmission in a criminal case. Proceedings of the National Academy of Sciences of the United States of America 99: 14292–14297.

Moore GE (1965) Cramming more components onto integrated circuits. Electronics 38: 114–117.

Moult J, Fidelis K, Kryshtafovych A, et al. (2018) Critical assessment of methods of protein structure prediction (CASP)‐Round XII. Proteins 86 (Suppl 1): 7–15.

Nei M and Kumar S (2000) Molecular Evolution and Phylogenetics. Oxford, UK/New York: Oxford University Press.

Pearson WR and Lipman DJ (1988) Improved tools for biological sequence comparison. Proceedings of the National Academy of Sciences of the United States of America 85: 2444–2448.

Pevzner PA, Tang H and Waterman MS (2001) An Eulerian path approach to DNA fragment assembly. Proceedings of the National Academy of Sciences of the United States of America 98: 9748–9753.

Rigden DJ and Fernandez XM (2018) The 2018 Nucleic Acids Research database issue and the online molecular biology database collection. Nucleic Acids Research 46: D1–D7.

Ronquist F, Teslenko M, van der Mark P, et al. (2012) MrBayes 3.2: efficient Bayesian phylogenetic inference and model choice across a large model space. Systematic Biology 61: 539–542.

Rosikiewicz W, Kabza M, Kosinski JG, et al. (2017) RetrogeneDB‐a database of plant and animal retrocopies. Database: The Journal of Biological Databases and Curation 2017: bax038.

Silvester N, Alako B, Amid C, et al. (2018) The European Nucleotide Archive in 2017. Nucleic Acids Research 46: D36–D40.

Stamatakis A (2014) RAxML version 8: a tool for phylogenetic analysis and post‐analysis of large phylogenies. Bioinformatics 30: 1312–1313.

The 1000 Genomes Project Consortium (2015) A global reference for human genetic variation. Nature 526: 68–74.

UniProt Consortium T (2018) UniProt: the universal protein knowledgebase. Nucleic Acids Research 46: 2699.

Vamathevan J and Birney E (2017) A review of recent advances in translational bioinformatics: bridges from biology to medicine. Yearbook of Medical Informatics 26: 178–187.

Venter JC, Adams MD, Myers EW, et al. (2001) The sequence of the human genome. Science 291: 1304–1351.

Vogel C and Marcotte EM (2012) Insights into the regulation of protein abundance from proteomic and transcriptomic analyses. Nature Reviews. Genetics 13: 227–232.

Further Reading

Bromham L (2016) An Introduction to Molecular Evolution and Phylogenetics. Oxford, UK/New York: Oxford University Press.

Compeau P and Pevzner P (2015) Bioinformatics Algorithms: An Active Learning Approach. La Jolla, CA: Active Learning Publishers.

Lesk AM (2017) Introduction to Genomics. Oxford, UK: Oxford University Press.

Vamathevan J and Birney E (2017) A review of recent advances in translational bioinformatics: bridges from biology to medicine. Yearbook of Medical Informatics 26: 178–187.

Voit EO (2018) A First Course in Systems Biology. New York: Garland Science.

Wu CH, Arighi CN and Ross KE (2017) Protein Bioinformatics: From Protein Modifications and Networks to Proteomics. New York: Humana Press.

Contact Editor close
Submit a note to the editor about this article by filling in the form below.

* Required Field

How to Cite close
Makałowski, Wojciech, Shabardina, Victoria, and Makałowska, Izabela(Aug 2018) Bioinformatics. In: eLS. John Wiley & Sons Ltd, Chichester. http://www.els.net [doi: 10.1002/9780470015902.a0005247.pub3]