Profile Searching

Abstract

A profile is a computational representation of sequence properties derived from a family of related proteins. Analogous to single sequences, profiles can be used for database searches using a dynamic programming algorithm. Because profiles allow positionā€specific scoring systems and gap parameters, profile searches offer a greatly increased sensitivity in detecting distant protein relationships.

Keywords: sequence analysis; alignment; evolutionary distance; homology; statistical significance

Figure 1.

Short sample profile, in generalized profile format: (a) small sample alignment of seven sequences with four residues each; (b) generalized profile derived from the alignment in (a). The lines starting with ‘/M’ contain the actual profile data; the other lines contain additional information such as the alphabet used, default gap parameters and normalization parameters.

Figure 2.

Representation of the iterative refinement process.

close

References

Apweiler R, Attwood TK, Bairoch A, et al. (2001) The InterPro database, an integrated documentation resource for protein families, domains and functional sites. Nucleic Acids Research 29: 37–40.

Baldi P, Chauvin Y, Hunkapiller T and McClure MA (1994) Hidden Markov models of biological primary sequence information. Proceedings of the National Academy of Sciences of the United States of America 91: 1059–1063.

Bucher P and Bairoch A (1994) A generalized profile syntax for biomolecular sequence motifs and its function in automatic sequence interpretation. International Conference on Intelligent Systems for Molecular Biology 2: 53–61.

Bucher P, Karplus K, Moeri N and Hofmann K (1996) A flexible motif search technique based on generalized profiles. Computers and Chemistry 20: 3–23.

Gribskov M, McLachlan AD and Eisenberg D (1987) Profile analysis: detection of distantly related proteins. Proceedings of the National Academy of Sciences of the United States of America 84: 4355–4358.

Henikoff S and Henikoff JG (1993) Performance evaluation of amino acid substitution matrices. Proteins 17: 49–61.

Hofmann K (2000) Sensitive protein comparisons with profiles and Hidden Markov models. Briefings in Bioinformatics 1: 167–179.

Krogh A, Brown M, Mian IS, Sjolander K and Haussler D (1994) Hidden Markov models in computational biology: applications to protein modeling. Journal of Molecular Biology 235: 1501–1531.

Luthy R, Xenarios I and Bucher P (1994) Improving the sensitivity of the sequence profile method. Protein Science: A Publication of the Protein Society 3: 139–146.

Smith TF and Waterman MS (1981) Identification of common molecular subsequences. Journal of Molecular Biology 147: 195–197.

Thompson JD, Higgins DG and Gibson TJ (1994) Improved sensitivity of profile searches through the use of sequence weights and gap excision. Computer Applications in the Biosciences: CABIOS 10: 19–29.

Further Reading

Agarwal P and States DJ (1998) Comparative accuracy of methods for protein sequence similarity search. Bioinformatics 14: 40–47.

Attwood TK (2000) The role of pattern databases in sequence analysis. Briefings in Bioinformatics 1: 45–59.

Birney E, Thompson JD and Gibson TJ (1996) PairWise and SearchWise: finding the optimal alignment in a simultaneous comparison of a protein profile against all DNA translation frames. Nucleic Acids Research 24: 2730–2739.

Bork P and Gibson TJ (1996) Applying motif and profile searches. Methods in Enzymology 266: 162–184.

Doolittle RF (1994) Protein sequence comparisons: searching databases and aligning sequences. Current Opinion in Biotechnology 5: 24–28.

Eddy SR (1998) Profile hidden Markov models. Bioinformatics 14: 755–763.

Hofmann K (1998) Protein classification & functional assignment. Bioinformatics: A Trends Guide 5: 18–21.

Mott R and Tribe R (1999) Approximate statistics of gapped alignments. Journal of Computational Biology: A Journal of Computational Molecular Cell Biology 6: 91–112.

Tatusov RL, Altschul SF and Koonin EV (1994) Detection of conserved segments in proteins: iterative scanning of sequence databases with alignment blocks. Proceedings of the National Academy of Sciences of the United States of America 91: 12091–12095.

Vingron M and Waterman MS (1994) Sequence alignment and penalty choice: review of concepts, case studies and implications. Journal of Molecular Biology 235: 1–12.

Contact Editor close
Submit a note to the editor about this article by filling in the form below.

* Required Field

How to Cite close
Hofmann, Kay(Sep 2005) Profile Searching. In: eLS. John Wiley & Sons Ltd, Chichester. http://www.els.net [doi: 10.1038/npg.els.0005259]