Sequence Alignment

Abstract

Definition sequence comparison or alignment is the cornerstone of bioinformatics, providing the basis for sequence database searching, three‐dimensional structure modeling and evolutionary studies. A sequence alignment shows how a set of sequences may be related by identifying and arranging in columns the structurally and functionally equivalent residues common to all the sequences.

Keywords: pairwise alignment; multiple alignment; scoring matrices; alignment statistics; dot plots

Figure 1.

Alignment of six scorpion toxin proteins. Conserved positions are shown in bold. Gaps are represented by dashes between the letter strings. The secondary structure elements of the scorpion Leiurus quinquestriatus hebraeus protein (SCXA_LEIQH) are shown below the alignment. Right arrow: β‐sheet; coil: α‐helix.

Figure 2.

PAM250 matrix. Substitution scores for amino acids.

Figure 3.

Dynamic programming alignment matrices for global (a) and local (b) alignments of two DNA sequences. Per cent identity scores for each alignment are calculated by dividing the number of identical residues aligned by the total number of residues aligned.

Figure 4.

Dot plot of a chicken tyrosine‐protein kinase protein (CSK_CHICK) compared to a Drosophila SH2–SH3 adaptor protein (DRK_DROME).

close

References

Altschul SF and Gish W (1996) Local alignment statistics. Methods in Enzymology 266: 460–480.

Altschul SF, Gish W, Miller W, Myers EW and Lipman DJ (1990) Basic local alignment search tool. Journal of Molecular Biology 215: 403–410.

Dayhoff M, Schwartz RM and Orcutt BC (1978) A model of evolutionary change in proteins. Atlas of Protein Sequence and Structure, vol. 5 (supplement 3), pp. 345–358. Silver Springs, MD: National Biomedical Research Foundation.

Doolittle RF (1986) Of Urfs and Orfs: Primer on How to Analyze Derived Amino Acid Sequences. Mill Valley, CA: University Science Books.

Feng DF and Doolittle RF (1987) Progressive sequence alignment as a prerequisite to correct phylogenetic trees. Journal of Molecular Evolution 25: 351–360.

Needleman SB and Wunsch CD (1970) A general method applicable to the search for similarities in the amino acid sequence of two proteins. Journal of Molecular Biology 48: 443–453.

Pearson WR (1998) Empirical statistical estimates for sequence similarity searches. Journal of Molecular Biology 276: 71–84.

Pearson WR and Lipman DJ (1988) Improved tools for biological sequence comparison. Proceedings of the National Academy of Sciences of the United States of America 85: 2444–2448.

Sankoff D (1975) Minimal mutation trees of sequences. SIAM Journal of Applied Mathematics 78: 35–42.

Smith TF and Waterman MS (1981) Identification of common molecular subsequences. Journal of Molecular Biology 215: 403–410.

States DJ and Boguski MS (1991) Similarity and homology. In: Gribskov M and Devereux J (eds.) Sequence Analysis Primer, pp. 92–124. NewYork, NY: Stockton Press.

Thompson JD, Higgins DG and Gibson TJ (1994) CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position‐specific gap penalties and matrix choice. Nucleic Acids Research 22: 4673–4680.

Further Reading

Altschul SF (1991) Amino acid substitution matrices from an information theoretic perspective. Journal of Molecular Biology 219: 555–565.

Apostolico A and Giancarlo R (1998) Sequence alignment in molecular biology. Journal of Computational Biology 5: 173–196.

Benner SA, Cohen MA and Gonnet GH (1993) Empirical and structural models for insertions and deletions in the divergent evolution of proteins. Journal of Molecular Biology 229: 1065–1082.

Durbin R, Eddy S, Krogh A and Mitchison G (1999) Pairwise alignment. Durbin R (ed.) Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids, pp. 12–45. Cambridge, UK: Cambridge University Press.

Gotoh O (1999) Multiple sequence alignment: algorithms and applications. Advanced Biophysics 36: 159–206.

Henikoff S and Henikoff JG (1993) Performance evaluation of amino acid substitution matrices. Proteins 17: 49–61.

Henikoff S (1994) Comparative sequence analysis: finding genes. In: Smith DW (ed.) Biocomputing, Informatics and Genome Projects, pp. 87–117. New York, NY: Academic Press.

Smith TF (1999) The art of matchmaking: sequence alignment methods and their structural implications. Structure with Folding and Design 7: R7–R12.

Vogt G, Etzold T and Argos P (1995) An assessment of amino acid exchange matrices in aligning protein sequences: the twilight zone revisited. Journal of Molecular Biology 249: 816–831.

Waterman MS (1995) Dynamic programming alignment of two sequences. In: Michael SW (ed.) Introduction to Computational Biology: Maps, Sequences and Genomes, pp. 183–232. London, UK: Chapman & Hall/CRC Press.

Yona G and Brenner SE (2000) Comparison of protein sequences and practical database searching. In: Higgins DG and Taylor WR (eds.) Bioinformatics: Sequence, Structure and Databanks. A Practical Approach, pp. 167–190. Oxford, UK: Oxford University Press.

Web Links

Blocks database http://www.blocks.fhcrc.org/

Contact Editor close
Submit a note to the editor about this article by filling in the form below.

* Required Field

How to Cite close
Thompson, Julie D, and Poch, Olivier(Jul 2006) Sequence Alignment. In: eLS. John Wiley & Sons Ltd, Chichester. http://www.els.net [doi: 10.1038/npg.els.0005318]