Similarity Search


Protein and DNA similarity searches are used to find sequences that are likely to be homologous. Homologous sequences diverged from a common ancestor; when two sequences share statistically significant similarity (much more than would be expected by chance), the most parsimonious explanation for the excess similarity is homology. The most effective similarity searches use protein or translated‐DNA : protein comparisons.

Keywords: similarity; homology; FASTA; BLAST; database searches; protein sequence comparison; DNA sequence comparison


Altschul SF (1991) Amino acid substitution matrices from an information theoretic perspective. Journal of Molecular Biology 219: 555–565.

Altschul SF, Bundschuh AR, Olsen R and Hwa T (2001) The estimation of statistical parameters for local alignment score distributions. Nucleic Acids Research 29: 351–361.

Altschul SF, Gish W, Miller W, Myers EW and Lipman DJ (1990) A basic local alignment search tool. Journal of Molecular Biology 215: 403–410.

Altschul SF, Madden TL, Schaffer AA, et al. (1997) Gapped BLAST and PSI‐BLAST: a new generation of protein database search programs. Nucleic Acids Research 25: 3389–3402.

Doolittle RF (1981) Similar amino acid sequences: chance or common ancestry? Science 214: 149–159.

Henikoff S and Henikoff JG (1992) Amino acid substitutions matrices from protein blocks. Proceedings of the National Academy of Sciences of the United States of America 89: 10915–10919.

Needleman S and Wunsch C (1970) A general method applicable to the search for similarities in the amino acid sequences of two proteins. Journal of Molecular Biology 48: 444–453.

Park J, Karplus K, Barrett C, et al. (1998) Sequence comparisons using multiple sequences detect three times as many remote homologues as pairwise methods. Journal of Molecular Biology 284: 1201–1210.

Pearson WR (1998) Empirical statistical estimates for sequence similarity searches. Journal of Molecular Biology 276: 71–84.

Pearson WR and Lipman DJ (1988) Improved tools for biological sequence comparison. Proceedings of the National Academy of Sciences of the United States of America 85: 2444–2448.

Smith TF and Waterman MS (1981) Identification of common molecular subsequences. Journal of Molecular Biology 147: 195–197.

Wootton JC and Federhen S (1993) Statistics of local complexity in amino acid sequences and sequence databases. Computers and Chemistry 17: 149–163.

Further Reading

Altschul SF, Boguski MS, Gish W and Wootton JC (1994) Issues in searching molecular sequence databases. Nature Genetics 6: 119–129.

Pearson WR and Wood TC (2001) Statistical significance in biological sequence comparison. In: Balding DJ, Bishop M and Cannings C (eds.) Handbook of Statistical Genetics, pp. 39–65. Chichester, UK: Wiley.

Contact Editor close
Submit a note to the editor about this article by filling in the form below.

* Required Field

How to Cite close
Pearson, William R(Sep 2005) Similarity Search. In: eLS. John Wiley & Sons Ltd, Chichester. [doi: 10.1038/npg.els.0005262]