Alignment: Statistical Significance


Computation of the probability that an observed alignment between two protein or DNA sequences could have arisen by chance is useful for identifying genuinely related sequences.

Keywords: sequence alignment; statistical significance; homology; DNA; protein sequence


Altschul SF and Gish W (1996) Local alignment statistics. Methods in Enzymology 266: 460–480.

Arratia RA, Morris P and Waterman MS (1988) Stochastic scrabble: large deviations for sequences with scores. Journal of Applied Probabability 25: 106–119.

Arslan AN, Egecioglu O and Pevzner PA (2001) A new approach to sequence comparison: normalized sequence alignment. Bioinformatics 17(4): 327–337.

Durbin R, Eddy S, Krogh A and Mitchison G (1998) Biological Sequence Analysis. Cambridge, UK: Cambridge University Press.

Karlin S and Altschul SF (1990) Methods for assessing the statistical significance of molecular sequence features by using general scoring schemes. Proceedings of the Naional Academy of Sciencs of the United States of America 87(6): 2264–2268.

Korf I, Flicek P, Duan D and Brent MR (2001) Integrating genomic homology into gene structure prediction. Bioinformatics 17(supplement 1): S140–148.

Mott RF (1992) Maximum‐likelihood estimation of the statistical distribution of Smith–Waterman local sequence similarity scores. Bulletin of Mathematical Biology 54: 59–75.

Mott RF (2000) Accurate formula for P‐values of gapped local sequence and profile alignments. Journal of Molecular Biology 300: 649–659.

Mott RF and Tribe R (1999) Approximate statistics of gapped alignments. Journal of Computational Biology 6: 91–112.

Olsen R, Bundschuh R and Hwa T (1999) Rapid assessment of extremal statistics for gapped local alignment. Proceedings of the International Conference on Intelligent Systems in Molecular Biology, pp. 211–222.

Park J, Teichmann SA, Hubbard T and Chothia C (1997) Intermediate sequences increase the detection of homology between sequences. Journal of Molecular Biology 273(1): 349–354.

Pearson WR (1998) Empirical statistical estimates for sequence similarity searches. Journal of Molecular Biology 276(1): 71–84.

Schaffer AA, Aravind L, Madden TL, et al. (2001) Improving the accuracy of PSI‐BLAST protein database searches with composition‐based statistics and other refinements. Nucleic Acids Research 29(14): 2994–3005.

Spang R and Vingron M (2001) Limits of homology detection by pairwise sequence comparison. Bioinformatics 17(4): 338–342.

Waterman MS and Vingron M (1994) Rapid and accurate estimates of statistical significance for sequence data base searches. Proceedings of the National Academy of Sciences of the United States of America 91(11): 4625–4628.

Further Reading

Altschul SF, Bundschuh R, Olsen R and Hwa T (2001) The estimation of statistical parameters for local alignment score distributions. Nucleic Acids Research 29(2): 351–361.

Smith TF and Waterman MSW (1981) Identification of common molecular subsequences. Journal of Molecular Biology 147: 195–197.

Waterman MS (1995) Introduction to Computational Biology: Maps, Sequences and Genomes. Boca Raton, FL: CRC Press.

Web Links



SMART http://smart.embl‐



Contact Editor close
Submit a note to the editor about this article by filling in the form below.

* Required Field

How to Cite close
Mott, Richard(Sep 2005) Alignment: Statistical Significance. In: eLS. John Wiley & Sons Ltd, Chichester. [doi: 10.1038/npg.els.0005264]