Bioinformatics

Abstract

Bioinformatics is a discipline at the intersection of biology, computer science, information technology and mathematics. It aims at integrating and analysing a wealth of biological data with the aim of identifying and assigning a function to each of the parts list of a living organism and understanding the incredibly complex processes that define life at a systems level. It is applied, for example, in the construction of genetic and physical maps of genomes, gene discovery, the inference of the molecular function and three‐dimensional structure of their products, the interpretation of the effect of gene variations on the phenotype, the reconstruction of interaction and signal transduction pathways and the simulation of biological systems. Bioinformatics is an essential part of modern biology and a key player in the quest for a complete systems‐level understanding of a living cell and of an organism.

Key concepts

  • Biological data are being produced at an unprecedented speed and need to be organized and integrated.

  • Bioinformatics, or computational biology, is the discipline that uses computational tools for analysing biological data and adds ‘biological’ value to them.

  • The complete DNA sequences of many genomes, including Homo sapiens, have been elucidated and need to be annotated, that is, associated to their biological function.

  • Proteins, the products of genes, perform most of the functions of life.

  • The function of a protein can be deduced by its evolutionary history or by analysing the three‐dimensional shape that it assumes.

  • The evolutionary history of a protein (a gene) can be deduced by comparing its sequence with those of all other known proteins (or genes).

  • The number of possible three‐dimensional structures of a protein is almost infinite and the correct one cannot be selected, as of today, on the basis of first principles.

  • Empirical methods, based on evolution or on the body of experimental knowledge on protein structures, can be used to infer the unknown structure of a protein.

  • The future challenge of bioinformatics is to use and combine available data on sequences, three‐dimensional structures, interactions, abundance and temporal patterns of expression to obtain a systems level understanding of life.

Keywords: sequence analysis; nucleic acids; human genome project; protein structure; gene map

References

Altschul SF, Gish W, Miller W, Myers EW and Lipman DJ (1990) BLAST – basic local alignment search tool. Journal of Molecular Biology 215: 403–410.

Altschul S, Madden T, Schaffer A et al. (1997) Gapped blast and psi‐blast: a new generation of protein database search programs. Nucleic Acids Research 25: 3389–3402.

Amberger J, Bocchini CA, Scott AF and Hamosh A (2008) McKusick's online Mendelian inheritance in man (OMIM). Nucleic Acids Research 37: D793–D796.

Berman HM, Westbrook J, Feng Z et al. (2000) The protein data bank. Nucleic Acids Research 28: 235–242.

Brazma A, Jonassen I, Eidhammer I and Gilbert D (1998) Approaches to the automatic discovery of patterns in biosequences. Journal of Computational Biology 5(2): 279–305.

Canutescu A, Shelenkov AA and Dunbrack RL Jr (2003) A graph theory algorithm for protein side‐chain prediction. Protein Science 12: 2001–2014.

Chothia C (1992) Proteins. One thousand families for the molecular biologist. Nature 357: 543–544.

Chothia C and Lesk A (1986) The relation between the divergence of sequence and structure in proteins. EMBO Journal 5: 823–826.

Cozzetto D, Giorgetti A, Raimondo D and Tramontano A (2008) The evaluation of protein structure prediction results. Molecular Biotechnology 39: 1–8.

Das R and Baker D (2008) Macromolecular modeling with Rosetta. Annual Review of Biochemistry 77: 363–382.

Das R, Qian B, Raman S et al. (2007) Structure prediction for CASP7 targets using extensive all‐atom refinement with Rosetta@home. Proteins 69: 118–128.

Eddy SR (1996) Hidden Markov models. Current Opinions in Structural Biology 6: 361–365.

Franklin RE and Gosling RG (1953) Molecular configuration of sodium thymonucleate. Nature 171: 740–741.

Gribskov M, McLachlan AD and Eisenberg D (1987) Profile analysis: detection of distantly related proteins. Proceedings of the National Academy of Sciences of the USA 84(13): 4355–4358.

Gupta PK (2008) Single‐molecule DNA sequencing technologies for future genomics research. Trends in Biotechnology 26: 602–611.

Jones DT (2001) Predicting novel protein folds by using FRAGFOLD. Proteins 5: 127–132.

Jones DT, Taylor WR and Thornton JM (1992) A new approach to protein fold recognition. Nature 358: 86–89.

Krallinger M, Morgan A, Smith L et al. (2008) Evaluation of text‐mining systems for biology: overview of the second biocreative community challenge. Genome Biology 9. doi:10.1186/gb‐2008‐9‐s2‐s1.

Krogh A, Brown M, Mian IS, Sjolander K and Haussler D (1994) Hidden Markov models in computational biology. Applications to protein modeling. Journal of Molecular Biology 235: 1501–1531.

Lander ES, Linton LM, Birren B et al. (2001) Initial sequencing and analysis of the human genome. Nature 409: 860–921.

Levinthal C (1968) Are there pathways for protein folding? Journal de Chimie Physique et de Physico‐Chimie Biologique 65: 44–45.

Madej T, Gibrat JF and Bryant SH (1995) Threading a database of protein cores. Proteins 23(3): 356–369.

Needleman SB and Wunsch CD (1970) A general method applicable to the search for similarities in the amino acid sequence of two proteins. Journal of Molecular Biology 48(3): 443–453.

Pearson W and Lipman D (1988) Improved tools for biological sequence analysis. Proceedings of the National Academy of Sciences of the USA 85: 2444–2448.

Sadreyev R and Grishin N (2003) COMPASS: a tool for comparison of multiple protein alignments with assessment of statistical significance. Journal of Molecular Biology 326: 317–336.

Sander C and Schneider R (1991) Database of homology‐derived protein structures and the structural meaning of sequence alignment. Proteins 9: 56–68.

Sela M, White FH Jr and Anfinsen CB (1957) Reductive cleavage of disulfide bridges in ribonuclease. Sciente 125: 691–692.

Smith TF and Waterman MS (1981) Identification of common molecular subsequences. Journal of Molecular Biology 147: 195–197.

Tatusov RL, Altschul SF and Koonin EV (1994) Detection of conserved segments in proteins: iterative scanning of sequence databases with alignment blocks. Proceedings of the National Academy of Sciences of the USA 91: 12091–12095.

The International HapMap Consortium (2007) A second generation human haplotype map of over 3.1 million SNPs. Nature 449: 851–861.

Tramontano A (1998) Homology modeling with low sequence identity. Methods 14: 293–300.

Tramontano A and Pearson WR (2007) The completeness of biological space. Current Opinions in Structural Biology 17: 334–336.

Venter et al. (2001) The sequence of the human genome. Science 291: 1304–1351.

Watson JD and Crick FHC (1953) A structure for deoxyribose nucleic acid. Nature 171: 737–738.

Further Reading

Lattman EE and Loll PJ (2008) Protein Crystallography: A Concise Guide. Baltimore, USA: Johns Hopkins University Press.

Lesk AM (2002) Introduction to Bioinformatics. Oxford, UK: Oxford University Press. ISBN/ISSN: 9780199251964.

Lesk AM (2007) Introduction to Genomics. Oxford, UK: Oxford University Press. ISBN/ISSN: 9780199296958.

Tramontano A (2006) Protein Structure Prediction: Concepts and Applications. Weinheim, D: Wiley. ISBN/ISSN: 978352731167‐5.

Wüthrich K (1986) NMR of Proteins and Nucleic Acids. New York, USA: Wiley.

Contact Editor close
Submit a note to the editor about this article by filling in the form below.

* Required Field

How to Cite close
Tramontano, A(Sep 2009) Bioinformatics. In: eLS. John Wiley & Sons Ltd, Chichester. http://www.els.net [doi: 10.1002/9780470015902.a0001900.pub2]