Bioinformatics

Abstract

Molecular biologists generate data at an unparalleled pace, and the demands and opportunities for interpreting such data are expanding more than ever. In this context, bioinformatics has emerged as a strategic frontier involving molecular biology, molecular evolution, information technology and statistics. Bioinformatics may be defined as the research, development, or application of computational tools and approaches for expanding the use of biological data, including those to acquire, store, organise, archive, analyse, or visualise such data. Its goal is to enable biological discovery based on existing data, or in other words to transform biological data into information, and eventually into knowledge.

Key Concepts

  • Biological data grow faster than Moore's law predicts, which means that information technology might be a bottleneck of the data analyses.

  • The amount of data produced by biologists triggered a boom of specialized databases.

  • Nucleotide sequences are read in small stretches that need to be assembled computationally into whole molecules.

  • Searching sequence databases with the unknown sequence as a query, so‚Äźcalled similarity search, is most likely the most frequent bioinformatic activity.

  • Systems biology uses advanced mathematics to build complex models of biological activities.

Keywords: computational biology; biological database; sequence analysis; protein analysis; sequence assembly; gene prediction; similarity search; protein structure prediction; systems biology; phylogenetic analysis

Figure 1.

Growth of biological information measured by the number of nucleotides deposited in NCBI's GenBank and the number of nucleotides deposited by large sequencing projects (Whole Genome Shotgun).

Figure 2.

Bioinformatics is a research programme that emerged at the crossroad of other scientific disciplines.

Figure 3.

Word clouds (http://www.tagxedo.com/) generated from the abstracts of papers with assigned ‘bioinformatics’ keywords and published in years 1991–2000 (a), 2001–2010 (b), and 2011–2013 (c). (d) Represent papers with assigned ‘computational biology’ keyword published in years 2011–2013. Data retrieved from PubMed (http://www.ncbi.nlm.nih.gov/pubmed). In short, the size of any word in the cloud is proportional to the occurrence frequency of a given word in the dataset. These graphs demonstrate evolution of bioinformatics field as seen by topics that the community was focused on. Comparison of (c) and (d) demonstrates that bioinformatics and computational biology are indistinguishable by the topics that they are working on.

close

References

Altschul SF, Gish W, Miller W et al. (1990) Basic local alignment search tool. Journal of Molecular Biology 215: 403–410.

Apweiler R, Bateman A, Martin MJ et al. (2014) Activities at the Universal Protein Resource (UniProt). Nucleic Acids Research 42: D191–D198.

Benson DA, Clark K, Karsch‐Mizrachi I et al. (2014) GenBank. Nucleic Acids Research 42: D32–D37.

Brew K, Vanaman TC and Hill RL (1967) Comparison of the amino acid sequence of bovine alpha‐lactalbumin and hens egg white lysozyme. Journal of Biological Chemistry 242: 3747–3749.

Buermans HP and den Dunnen JT (2014) Next generation sequencing technology: advances and applications. Biochimica et Biophysica Acta 1842: 1932–1941.

Hesper B and Hogeweg P (1970) Bioinformatica: een werkconcept. Kameleon 1: 28–29.

Kim DE, Chivian D and Baker D (2004) Protein structure prediction and analysis using the Robetta server. Nucleic Acids Research 32: W526–W531.

Kosuge T, Mashima J, Kodama Y et al. (2014) DDBJ progress report: a new submission system for leading to a correct annotation. Nucleic Acids Research 42: D44–D49.

Kryshtafovych A, Monastyrskyy B and Fidelis K (2014) CASP prediction center infrastructure and evaluation measures in CASP10 and CASP ROLL. Proteins 82(suppl. 2): 7–13.

Lander ES, Linton LM, Birren B et al. (2001) Initial sequencing and analysis of the human genome. Nature 409: 860–921.

Lowe TM and Eddy SR (1997) tRNAscan‐SE: a program for improved detection of transfer RNA genes in genomic sequence. Nucleic Acids Research 25: 955–964.

Makalowski W (2001) The human genome structure and organization. Acta Biochimica Polonica 48: 587–598.

Nei M and Kumar S (2000) Molecular Evolution and Phylogenetics. Oxford; New York: Oxford University Press.

Pakseresht N, Alako B, Amid C et al. (2014) Assembly information services in the European Nucleotide Archive. Nucleic Acids Research 42: D38–D43.

Pearson WR and Lipman DJ (1988) Improved tools for biological sequence comparison. Proceedings of the National Academy of Sciences of the United States of America 85: 2444–2448.

Szczesniak MW and Makalowska I (2014) miRNEST 2.0: a database of plant and animal microRNAs. Nucleic Acids Research 42: D74–D77.

Tamura K, Peterson D, Peterson N et al. (2011) MEGA5: molecular evolutionary genetics analysis using maximum likelihood, evolutionary distance, and maximum parsimony methods. Molecular Biology and Evolution 28: 2731–2739.

Venter JC, Adams MD, Myers EW et al. (2001) The sequence of the human genome. Science 291: 1304–1351.

Vogel C and Marcotte EM (2012) Insights into the regulation of protein abundance from proteomic and transcriptomic analyses. Nature Reviews Genetics 13: 227–232.

Further Reading

Felsenstein J (2004) Inferring Phylogenies. Sunderland, MA: Sinauer Associates.

Higgins PG and Attwood TK (2004) Bioinformatics and Molecular Evolution. Madden, MA: Wiley‐Blackwell.

Lemey P, Salemi M and Vandamme AM (2009) The Phylogenetic Handbook: A Practical Approach to Phylogenetic Analysis and Hypothesis. Cambridge: Cambridge University Press.

Lesk A (2014) Introduction to Bioinformatics. Oxford: Oxford University Press.

Tuncbag N, Kar G, Keskin O, Gursoy A and Nussinov R (2009) A survey of available tools and web servers for analysis of protein‐protein interactions and interfaces. Briefings in Bioinformatics 10, 217–232.

Contact Editor close
Submit a note to the editor about this article by filling in the form below.

* Required Field

How to Cite close
Makałowski, Wojciech, Jąkalski, Marcin, and Makałowska, Izabela(Nov 2014) Bioinformatics. In: eLS. John Wiley & Sons Ltd, Chichester. http://www.els.net [doi: 10.1002/9780470015902.a0005247.pub2]