Multiple Alignment

Abstract

Multiple alignment is a powerful integrative tool that addresses a variety of biological problems, ranging from key functional residue detection to the evolution of a protein family. Traditionally, a multiple alignment was generally constructed as a series of pairwise alignments; however, the recent application of various new computational techniques to the multiple alignment problem has led to a number of interesting new developments.

Keywords: progressive alignment; iterative alignment; hidden Markov model; genetic algorithm; objective function

Figure 1.

The basic progressive alignment procedure, exemplified by a set of five immunoglobulin‐like domains. The sequence names are from the SWISS‐PROT or Protein Data Bank (PDB) databases: 1HNF, human cell adhesion (CD2) protein; CD2_HORSE, horse cell adhesion protein; CD2_RAT, rat cell adhesion protein; MYPS_HUMAN, human myosin‐binding protein; 1WIT, nematode twitchin muscle protein. The first step involves aligning all possible pairs of sequences in order to determine the distances between them. A guide tree is then created and is used to determine the order of the multiple alignment. First, the human and horse CD2 sequences are aligned. These two sequences are then aligned with the rat CD2 sequence. Finally, the myosin‐binding protein sequence is aligned with the twitchin sequence, before being merged with the alignment of the three CD2 sequences. The secondary structure elements of the immunoglobulin‐like domains from the human CD2 (1HNF) and the nematode twitchin (1WIT) proteins are shown above and below the alignment (right arrow, beta sheet; coil, alpha helix).

close

References

Barton GJ and Sternberg MJ (1987) A strategy for the rapid multiple alignment of protein sequences. Confidence levels from tertiary structure comparisons. Journal of Molecular Biology 198: 327–337.

Bucka‐Lassen K, Caprani O and Hein J (1999) Combining many multiple alignments in one improved alignment. Bioinformatics 15: 122–130.

Carrillo H and Lipman D (1988) The multiple sequence alignment problem in biology. SIAM Journal of Applied Mathematics 48: 1073–1082.

Eddy SR (1998) Profile hidden Markov models. Bioinformatics 14: 755–763.

Feng DF and Doolittle RF (1987) Progressive sequence alignment as a prerequisite to correct phylogenetic trees. Journal of Molecular Evolution 25: 351–360.

Gotoh O (1996) Significant improvement in accuracy of multiple protein sequence alignments by iterative refinement as assessed by reference to structural alignments. Journal of Molecular Biology 264: 823–838.

Gupta SK, Kececioglu JD and Schaffer AA (1995) Improving the practical space and time efficiency of the shortest‐paths approach to sum‐of‐pairs multiple sequence alignment. Journal of Computational Biology 2: 459–472.

Hein J (1990) Unified approach to alignment and phylogenies. Methods in Enzymology 183: 626–645.

Heringa J (1999) Two strategies for sequence comparison: profile‐preprocessed and secondary structure‐induced multiple alignment. Computational Chemistry 23: 341–364.

Karplus K, Barrett C and Hughey R (1998) Hidden Markov models for detecting remote protein homologies. Bioinformatics 14: 846–856.

McClure MA, Vasi TK and Fitch WM (1994) Comparative analysis of multiple protein sequence alignment methods. Molecular Biology and Evolution 11: 571–592.

Morgenstern B (1999) DIALIGN 2: improvement of the segment‐to‐segment approach to multiple sequence alignment. Bioinformatics 15: 211–218.

Neuwald AF, Liu JS, Lipman DJ and Lawrence CE (1997) Extracting protein alignment models from the sequence database. Nucleic Acids Research 25: 1665–1677.

Notredame C and Higgins DG (1996) SAGA: sequence alignment by genetic algorithm. Nucleic Acids Research 24: 1515–1524.

Notredame C, Higgins DG and Heringa J (2000) T‐Coffee: a novel method for fast and accurate multiple sequence alignment. Journal of Molecular Biology 302: 205–217.

Sankoff D (1975) Minimal mutation trees of sequences. SIAM Journal of Applied Mathematics 78: 35–42.

Schuler GD, Altschul SF and Lipman DJ (1991) A workbench for multiple alignment construction and analysis. Proteins 9: 180–190.

Smith RF and Smith TF (1992) Pattern‐induced multi‐sequence alignment (PIMA) algorithm employing secondary structure‐dependent gap penalties for use in comparative protein modelling. Protein Engineering 5: 35–41.

Stoye J (1998) Multiple sequence alignment with the divide‐and‐conquer method. Gene 211: GC45–GC56.

Thompson JD, Gibson TJ, Plewniak F, Jeanmougin F and Higgins DG (1997) The CLUSTAL_X windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools. Nucleic Acids Research 25: 4876–4882.

Thompson JD, Plewniak F and Poch O (1999) A comprehensive comparison of multiple sequence alignment programs. Nucleic Acids Research 27: 2682–2690.

Thompson JD, Plewniak F, Thierry JC and Poch O (2000) DbClustal: rapid and reliable global multiple alignments of protein sequences detected by database searches. Nucleic Acids Research 28: 2919–2926.

Further Reading

Baxevanis AD (1998) Practical aspects of multiple sequence alignment. Methods in Biochemical Analysis 39: 172–188.

Durbin R, Eddy S, Krogh A and Mitchison G (1999) Multiple sequence alignment methods. In: Durbin R (ed.) Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids, pp 134–159. Cambridge, UK: Cambridge University Press.

Duret L and Abdeddaim S (2000) Multiple alignments for structural, functional, or phylogenetic analysis of homologous sequences. In: Higgins DG and Taylor WR (eds.) Bioinformatics: Sequence, Structure, and Databanks: A Practical Approach, pp 51–76. Oxford, UK: Oxford University Press.

Gonnet GH, Korostensky C and Benner S (2000) Evaluation measures of multiple sequence alignments. Journal of Computational Biology 7: 261–276.

Gotoh O (1999) Multiple sequence alignment: algorithms and applications. Advanced Biophysics 36: 159–206.

Higgins DG and Taylor WR (2000) Multiple sequence alignment. Methods in Molecular Biology 143: 1–18.

Hirosawa M, Totoki Y, Hoshida M and Ishikawa M (1995) Comprehensive study on iterative algorithms of multiple sequence alignment. Computer Applications in the Biosciences 11: 13–18.

Phillips A, Janies D and Wheeler W (2000) Multiple sequence alignment in phylogenetic analysis. Molecular and Phylogenetic Evolution 16: 317–330.

Contact Editor close
Submit a note to the editor about this article by filling in the form below.

* Required Field

How to Cite close
Thompson, Julie, and Poch, Olivier(Sep 2005) Multiple Alignment. In: eLS. John Wiley & Sons Ltd, Chichester. http://www.els.net [doi: 10.1038/npg.els.0005258]