Sequence Alignment: Scoring Matrices
Sequence Alignment: Scoring Matrices
Sequence alignment
Scoring Matrices
Dynamic Programming Solution
Scoring matrices
• Number of changes of each amino acid into every other amino acid
was counted
without affecting the function
• The amino acid exchange counts and mutability values were used to
generate a 20 x 20 mutation probability matrix representing all
possible amino acid changes.
Percent Accepted Mutation
(PAM or Dayhoff) Matrices
• Since the changes are independent of previous
mutational events, the PAM1 matrix can be multiplied by
itself N times to give the transition matrices for
sequences that have undergone N mutations.
PAM BLOSUM
Built from global alignments Built from local alignments
Built from small amount of Data Built from vast amount of Data
Counting is based on minimum Counting based on groups of
replacement or maximum related sequences counted as
parsimony one
Perform better for finding global Better for finding local
alignments and remote homologs alignments
Higher PAM series means more Lower BLOSUM series means
divergence more divergence
NUCLEIC ACID Scoring Matrices
Dynamic Programming Solution
Nucleic Acid Scoring Matrices
• Two mutation models (models of nucleotide evolution)
– Uniform mutation rates (Jukes-Cantor)
– Two separate mutation rates (Kimura) Generally, the rate of
• Transitions transitions is thought to be
• Transversions higher than the rate of
transversions.
DNA Mutations
A G
PURINES: A, G
PYRIMIDINES C, T
C T
GAP Penalties
(General and Affine)
Pairwise alignment of retinol-binding protein (RBP)
and b-lactoglobulin
1 MKWVWALLLLAAWAAAERDCRVSSFRVKENFDKARFSGTWYAMAKKDPEG 50 RBP
. ||| | . |. . . | : .||||.:| :
1 ...MKCLLLALALTCGAQALIVT..QTMKGLDIQKVAGTWYSLAMAASD. 44 lactoglobulin
51 LFLQDNIVAEFSVDETGQMSATAKGRVR.LLNNWD..VCADMVGTFTDTE 97 RBP
: | | | | :: | .| . || |: || |.
45 ISLLDAQSAPLRV.YVEELKPTPEGDLEILLQKWENGECAQKKIIAEKTK 93 lactoglobulin
The overall penalty for one large gap is the same as for
many small gaps.
Constant GAPS
because you do not want to add too much of a penalty for further
extending the gap, once it is opened.
Affine Gap Penalty
E = Kmn e-λS
– m,n: Lengths of sequences
– K ,λ: statistical parameters & natural scales
for the search space size and the scoring
system respectively.
A search from a database