Needleman-Wunsch Algorithm for Sequence Similarity Searches
Needleman, S. B., Wunsch, C. D., J. Mol. Biol. (1970) 48:443-453
- General algorithm for sequence comparison
- Fundamental principle - to calculate the alignment score S(i,j) you only need to enumerate and score all ways in which one
aligned pair can be added to a shorter alignment to produce an alignment of the first i residues of seq1 and the first j residues of
seq2
- All possible pairs are represented by a two-dimensional array,
and all possible comparisons are represented by pathways through this
array
- Global alignments ... i.e. every residue of the two sequences has to participate - therefore will not detect motif
or active site homology alone
Three Main Steps
- 1. Assign similarity scores
- A numerical value (score) is assigned to every cell in the array depending on similarity/dissimilarity
- Similarity scores may be simple, or related to chemical similarities or frequency of observed substitutions
- 2. Score pathways through array
- For each cell want to know the maximum possible score for an alignment ending at that point
- Cumulative score by adding in a path through the matrix
- Searches subrow and subcolumn for the highest score
- Gap penalty independent of the length of the gap
- The best match is the pathway with the highest score
- Maximum match = largest number of amino acids of one protein
that can be matched with those of another protein while allowing for
all possible deletions
- 3. Construct an alignment
Smith-Waterman Algorithm
Smith, T. F., Waterman, M. S., J. Mol. Biol. (1981) 147:195-197
- Based on N-W Algorithm
- Instead of looking at each sequence in its entirety, this compares segments of all possible lengths and chooses whichever
optimise the similarity measure (local alignments)
- Comparing two sequences A (=a_{1} a_{2} a_{3} ... a_{n}) and B
(=b_{1} b_{2} b_{3} ... b_{m})
- H_{ij}=max{H_{i-1, j-1} +s(a_{i},b_{j}),
max{H_{i-k,j} -W_{k}}, max{H_{i, j-l} -W_{l}}, 0}
- H_{ij} is the maximum similarity of two segments ending in a_{i} and b_{j}
respectively.
- Including the alternative that the similarity score be zero in the expression for H_{ij}
allows the local alignment to restart at any pair of aligned residues.
In addition, it makes the calculation much more sensitive to the
precise match and mismatch scores and gap penalties.
Needleman-Wunsch, Smith-Waterman
Presentaion
aoife | BLAST
| FastA