Bio Lec 4
Bio Lec 4
Bioinformatics
Profiles and Progressive AlignmentBy:Mirza A. Hammad
Profiles for families of sequences
can be built from MSAs
1 2 3
1 2 3
A 50% 75% 25%
C G — C 25% 0% 0%
A A T T 0% 0% 25%
A A A G 0% 25% 0%
— A — — 25% 0% 50%
• Profile: A table that lists the frequencies of each amino acid in each
position of protein sequence
where b 1
Insertion/deletion penalty
Gribskov et al. PNAS. 84 (13): 4355 (1987)
Profiles: Consensus Sequence
1 2 3 4 5 K K L - L M
K .75 .25 .50 1 - 2 3 4 5
L .75 .75
M .25 .25 .50 .25 Column 1 score:0.75 s(K,K) +
- .25 .25 .25
Profile-to-sequence alignments
3. Gaps
• Positions in early alignments where gaps have been opened
receive locally reduced gap penalties
• Residue-specific gap penalties and locally reduced gap penalties in
hydrophilic regions encourage new gaps in potential loop regions
rather than regular secondary structure.
Progressive Alignment:
Discussion
• Strengths:
• Speed
• Progression biologically sensible (aligns using a tree)
• Weaknesses:
• No objective function.
• No way of quantifying whether or not the alignment is good
Problems with CLUSTALW
• Clustal uses global alignment … may not be accurate for all parts of the
sequence
• T-Coffee considers local similarity as well as global
Iterative alignment
• Programs:
• MultAlin
• PRRP
• DIALIGN