Unit 3 Bioinformatics
Unit 3 Bioinformatics
6. SSAP method
Page 1
METHOD instead of
1. SUM OF PAIRS (SP) programming method. In this method aligning
The Sum-of-pairs method
is a dynamic sequencer
programming, we need to align3 or more
with dynamic
two sequences at atime
simultaneously. expensive
number of sequences because it is computationally
to any
Thistechnique is applicable
in both time and memory.
Process
pairs of query sequences
Standard dynamic programming is first used on all
possible matches or gaps at
then the "alignment space" is filled in by considering
intermediate positions
Finally construct an alignment between each two-sequence alignment.
2. PROGRESSIVE METHOD
This is the most commonly used approach of MSA. ClustalW, ClustalX, PileUP and PIMA are the
progressive alignment programs and are based on DP methods. Aslower but more accurate variant
of the progressive method is T-Coffee.
ClustalW produces the best match for the selected sequences and arranges them so that the
identities, similarities and differences can be seen.
ProcesS
3. ITERATIVE METHÌD
Iterative methods begin by making an initial alignment of the sequences. These alignments are
then revised to give a more reasonable result. The objective of this approach is to improve the
overall alignment score.
Dialign, SAGA,BlockMaker are few programs that use iterative methods for MSAs.
Page 2
prorie matrices are then used to search other
seguences for occurrences of the mot iey
characterize.
5. DALI METHOD
The DALI method, or distance matrix alignment, is afragment-based method for constructing
Structuralalignments based on contact similarity patterns between successive hexapeptides in
the query sequences.
6. SSAP METHOD
1. MSAs can be used to find the regions of similar sequences in all of the sequences that define a
conserved consensus pattern or domain.
2. MSAs are powerful tools for identifying new members of the aligned group.
3. It is possible to query databases of MSAs with single sequences and to query
sequence
databases with multiple sequences.
4. Design of degenerate PCR primers is another major application for multiple
alignments.
5. MSA are used to find the evolution and the sequence
conservation in a group of genes. i.e. the
phylogenetic analysis.
PAIR-WISE ALIGNMENT
Pair-wise sequence alignment methods are used to find the best-matching piecewise (local) or
global alignments of two querysequences. Palr-wise alignments can only be used between two
Sequences at a time, but they are efficient to calculate and are often used for methods that do
not require extreme precision (such as searching a database for sequences with high similarity to
a query).
The three primarymethods of producing pair-wise alignments are
1. Dot-matrix methods,
2. Dynamic programming,and
3. Word methods.
both query sequence and
Allthree pair-wise methods use the longest subsequence that occurs in
sequences typically
the existing sequence i.e. 'Maximum Unique Match' (MUM). Longer MUM
reflect closer relatedness.
1, DOT-MATRIX METHODS
This was founded by Gibbs & Mc Intyre in 1970. The dot-matrix approach is qualitative and simple
but time-consuming method. In this method it is easy to visually identify mutations such as
insertions, deletions, repeats, or inverted repeats.
Process:
To construct adot-matrix plot, the two sequencesare written along Xand Yaxes of a
two-dimensional matrix.
Adot is placed at any point where the characters in the two columns match.
It is a typical recurrence plot.
For eg, if we consider 2 sequences as follows
GGCTT
GGATTGA
The dot plots will appear as a single line along the matrix's main diagonal. The alignment with
the greatest number of identities will be the optimal alignment.
1. To visually compare two sequences and detect the regions of close similarity between
them.
2. DYNAMICPROGRAMMING METHOD
Dynamic programming is a method for breaking down the alignment of
sequences into small
parts.It is comparable to moving across a dot matrix and keeping
track of all the matching pairs.
It involves adding up those pairs that are along a
diagonal and substracting when insertions are
necessary to maintain an alignement.
Page ?
Dynamic programming can provide global (via the Needleman-Wunsch algorithm) or ioal
sequence alignments(via the Smith-Waterman algorithm).
ProcesS:
" Allpossible alignments of the 2 sequences are taken
" An alignment matrix is drawn using global or local alignment methods
" To calculate the score at any point in the alignment matrix,
The three adjacent positions in the alignment that represent the part of alignment are taken
and the following calculation is done
"A gap penalty was subtracted for an insertion( Eg: -2)
An identity or a mismatch isscored( Eg +1 or -1)
The final step is checking for the cellgiving the highest score and adding the scores along a
limited number of paths. The optimalalignment is the path which gives the highest total
SCore.
3. WORD METHODS
Word methods, also known as k-tuple methods, are heuristic methods are more efficient than
dynamic programming.Word methods are best known for their implementation in the
database search tools FASTA and the BLAST family.
FASTA
BLAST
FASTA Without neglecting accuracy.
BLAST Was developed to provide afaster alternative to
Methodology of BLAST involves:
1) The process of finding initial words called seeding is done
of interest,
2) Words are aligned as set of 3 residues. After making words for the sequence
neighborhood words are also assembled.
3) Once both words and neighborhood words are assembled and compiled, they are
compared to the sequences in the database in order to find matches.
4) Scores of alignment are calculated in comparision to a pre-determined score T.
These methods are especially useful in large-scale database searches. Word methods identify a
series of short, nonoverlapping subsequences ("words") in the query sequence that are then
matched to candidate database sequences
2.
Geu Prcliction- Tmhortince shl Metlhgds.
one: ol the
G¢ne predictton.by 0Onpalational unelbods or (indóig tho.lncation of prolcin cnding rcgionsís
.cSsential Ussucs in bióinivmuics, refersto the-
Gcne predtctibn bnsicully mens locating gcnes along geione. Also oalled gend finticig, it
process of identityihg. he nyions of agitouie DNAAhut u1COlG gens
}s eudes protein cotio: gencs, hey ucioa clomcnts. such as, the reguatoty genes.
NA, tence antl.