0% found this document useful (0 votes)

7 views

Bio Lec 4

This is related to bioinformatics

Uploaded by

Huma Tehreem

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views

Bio Lec 4

This is related to bioinformatics

Uploaded by

Huma Tehreem

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 18

Introduction to

Bioinformatics
Profiles and Progressive AlignmentBy:Mirza A. Hammad
Profiles for families of sequences
can be built from MSAs
1 2 3
1 2 3
A 50% 75% 25%
C G — C 25% 0% 0%
A A T T 0% 0% 25%
A A A G 0% 25% 0%
— A — — 25% 0% 50%

Note: While profiles can be used for any kind of

sequence data, we’ll focus on protein sequences
Profiles

• Profile: A table that lists the frequencies of each amino acid in each
position of protein sequence

• Frequencies are calculated from a MSA containing a domain of interest

• Allows us to identify consensus sequence

• Derived scoring scheme allows us to align a new sequence to the

profile
• Profile can be used in database searches
• Find new sequences that match the profile

• Profiles also used to compute multiple alignments heuristically

• Progressive alignment
Profiles: Position-Specific Scoring
Matrix (PSSM)
• To compare a sequence to a profile, need to assign a
score for each amino acid

• The score the profile for amino acid a at position p is

M ( p , a )  f ( p , b )  s ( a , b )
20

where b 1

• f(p,b) = frequency of amino acid b in position p

• s(a,b) is the score of (a,b) (from, e.g., BLOSUM or PAM)
Profiles: PSSM

Insertion/deletion penalty
Gribskov et al. PNAS. 84 (13): 4355 (1987)
Profiles: Consensus Sequence

• A consensus residue C(p) is generated at each

position of the profile to aid the display of
alignments of target sequences with the profile

• The consensus residue c is the amino acid at p that

has the highest score M(p,c)

• c is the amino acid most mutationally similar to all the

aligned residues of the probe sequences at p, rather than
the most common one
Aligning a sequence to a profile
1 2 3 4 5
K L M – K
K .75 .25 .50
K L K L K
K M M L – L .75 .75
M L – L M M .25 .25 .50 .25
- .25 .25 .25
New sequence:
K K L L M
K K L - L M
K - L M – K
Align with profile: K - L K L K
K K L - L M K - M M L –
1 - 2 3 4 5 M - L – L M
Scoring a sequence-to-profile
alignment
• Score each column separately according to PSSM
• Each character contributes to score, weighed by its frequency

1 2 3 4 5 K K L - L M
K .75 .25 .50 1 - 2 3 4 5
L .75 .75
M .25 .25 .50 .25 Column 1 score:0.75 s(K,K) +
- .25 .25 .25
Profile-to-sequence alignments

• Optimum alignment can be found by dynamic programming

• Extension of Needleman-Wunsch

• Spaces are only added to msa – never removed

• Once a gap, always a gap

• Can align profiles to profiles

Evolutionary Profiles

• Profiles just seen are called average profiles

• Generally perform well, but disregard some of the

biology
• How did each position evolve?
• Amount of conservation varies from position to position
• Type of conservation varies from position to position

• Alternative: Evolutionary profiles

• Gribskov, M. and Veretnik, S., Methods in Enzymology
266, 198-212, 1996
Progressive multiple alignment
• Feng & Doolittle 1987, Higgins and Sharp 1988

• Idea: Sequences to be aligned are phylogenetically related

• these relationships are used to guide the alignment

• Popular implementations: CLUSTALW, PILEUP, T-Coffee

CLUSTALW

1. Perform pair-wise alignments between all pairs of sequences

(n x (n-1)/2 possibilities)

2. Generate distance matrix.

• Distance between a pair = number of mismatched positions in
alignment divided by total number of matched positions

3. Generate a Neighbor-Joining ‘guide tree’ from distance table

4. Use guide tree to progressively align sequences in pairs from

tips to root of tree.
• Actually, align profiles
• “Once a gap, always a gap”
CLUSTALW
CLUSTALW Tree

Tree calculated from an alignment of more than 1100 ring finger

domains, using ClustalW 1.83.
CLUSTALW heuristics
1. Individual weights are assigned to each sequence in a
partial alignment in order to downweight similar
sequences and up-weight highly divergent ones

2. Varying substitution matrices at different alignment stages

according to sequence divergence

3. Gaps
• Positions in early alignments where gaps have been opened
receive locally reduced gap penalties
• Residue-specific gap penalties and locally reduced gap penalties in
hydrophilic regions encourage new gaps in potential loop regions
rather than regular secondary structure.
Progressive Alignment:
Discussion
• Strengths:
• Speed
• Progression biologically sensible (aligns using a tree)

• Weaknesses:
• No objective function.
• No way of quantifying whether or not the alignment is good
Problems with CLUSTALW

• Local minimum problem:

• Alignment depends on sequence addition order

• With each alignment some proportion of residues are misaligned

• Worse for divergent sequences

• Errors get “locked in” and propagate as sequences are added

• Can result in arbitrary and incorrect alignments

• Clustal uses global alignment … may not be accurate for all parts of the
sequence
• T-Coffee considers local similarity as well as global
Iterative alignment

• To avoid local minima, realign subgroups of

sequences and then incorporate them into a
growing multiple sequence alignment

• Improves overall alignment score.

• May involve rebuilding the guide tree
• May be randomized

• Programs:
• MultAlin
• PRRP
• DIALIGN

Bayesian Evolutionary Analysis With BEAST PDF
No ratings yet
Bayesian Evolutionary Analysis With BEAST PDF
247 pages
Dr. Zoya Khalid Zoya - Khalid@nu - Edu.pk
No ratings yet
Dr. Zoya Khalid Zoya - Khalid@nu - Edu.pk
51 pages
Multiple Seq Alignment
No ratings yet
Multiple Seq Alignment
36 pages
Lec7 - Multiple Sequence Alignment
No ratings yet
Lec7 - Multiple Sequence Alignment
22 pages
Lec4 - Multiple Sequence Alignment
No ratings yet
Lec4 - Multiple Sequence Alignment
22 pages
1 T Coffee Dalign 18
No ratings yet
1 T Coffee Dalign 18
31 pages
Analytical
No ratings yet
Analytical
24 pages
Multiple Alignment
No ratings yet
Multiple Alignment
28 pages
Msa
No ratings yet
Msa
28 pages
Multiple Sequence Alignment 3
No ratings yet
Multiple Sequence Alignment 3
22 pages
Lecture 5: Multiple Sequence Alignment: Introduction To Computational Biology
No ratings yet
Lecture 5: Multiple Sequence Alignment: Introduction To Computational Biology
34 pages
Multiple Sequence Alignment Black and White
No ratings yet
Multiple Sequence Alignment Black and White
2 pages
Note 7 - Group 7 Scribbing
No ratings yet
Note 7 - Group 7 Scribbing
7 pages
L8 Msa
No ratings yet
L8 Msa
52 pages
Multiple Sequence Alignment
No ratings yet
Multiple Sequence Alignment
19 pages
Alignments Jmcinerney
No ratings yet
Alignments Jmcinerney
48 pages
Sequence Analysis in Bioinformatics
No ratings yet
Sequence Analysis in Bioinformatics
18 pages
36) Corpet 1988
No ratings yet
36) Corpet 1988
10 pages
Multiple Sequence Alignment
No ratings yet
Multiple Sequence Alignment
89 pages
Multiple Sequence Alignment (MSA)
No ratings yet
Multiple Sequence Alignment (MSA)
78 pages
Sequence Alignment Methods and Algorithms
No ratings yet
Sequence Alignment Methods and Algorithms
37 pages
Sequence Alignment Methods and Algorithms
75% (4)
Sequence Alignment Methods and Algorithms
37 pages
Sequence Alignments: Felix Sappelt Irina Wagner
100% (1)
Sequence Alignments: Felix Sappelt Irina Wagner
34 pages
Importance and Significance of Sequence Alignment.pptx12
No ratings yet
Importance and Significance of Sequence Alignment.pptx12
15 pages
W03_Pairwise
No ratings yet
W03_Pairwise
55 pages
sequence allignment
No ratings yet
sequence allignment
5 pages
Notes Bioinformatics
No ratings yet
Notes Bioinformatics
14 pages
Lecture 6
No ratings yet
Lecture 6
31 pages
Module 3 CSE3069 (Bioinformatics)
No ratings yet
Module 3 CSE3069 (Bioinformatics)
57 pages
Lecture 101
No ratings yet
Lecture 101
43 pages
Multiple Alignment PDF
No ratings yet
Multiple Alignment PDF
45 pages
Clustalw
No ratings yet
Clustalw
9 pages
msa_MTech
No ratings yet
msa_MTech
17 pages
Alignment Lecture 4
No ratings yet
Alignment Lecture 4
30 pages
Bio Medical Tics - Sequence Analysis - Alignment - 2011
No ratings yet
Bio Medical Tics - Sequence Analysis - Alignment - 2011
96 pages
LO5 Pairwise Sequence Alignment
No ratings yet
LO5 Pairwise Sequence Alignment
11 pages
BIOLOGICAL DATABASES
No ratings yet
BIOLOGICAL DATABASES
13 pages
05. Sequence Alignment
No ratings yet
05. Sequence Alignment
9 pages
Alignment Methods
No ratings yet
Alignment Methods
33 pages
MULTIPLE SEQUENCE ALIGNMENT (1)
No ratings yet
MULTIPLE SEQUENCE ALIGNMENT (1)
18 pages
_second_done_w15_16_a_Multiple sequence alignment
No ratings yet
_second_done_w15_16_a_Multiple sequence alignment
36 pages
Sequence Comparison
No ratings yet
Sequence Comparison
39 pages
Sequence Alignment Methods
No ratings yet
Sequence Alignment Methods
32 pages
Sequence Analysis - Pairwise Alignment
No ratings yet
Sequence Analysis - Pairwise Alignment
26 pages
Sequence Alignment Presentation
No ratings yet
Sequence Alignment Presentation
27 pages
Sequence Alignment
No ratings yet
Sequence Alignment
29 pages
Multiple Sequence Alignment: Some Slides From Cuong Dang and Others
No ratings yet
Multiple Sequence Alignment: Some Slides From Cuong Dang and Others
27 pages
Clustal W Multiple Sequence Alignment
No ratings yet
Clustal W Multiple Sequence Alignment
18 pages
L3.4 Alignment
No ratings yet
L3.4 Alignment
90 pages
Msa Notes
No ratings yet
Msa Notes
10 pages
Bioinformatics: Sequence Alignment Methods
No ratings yet
Bioinformatics: Sequence Alignment Methods
32 pages
Frid Seminar
No ratings yet
Frid Seminar
30 pages
Dynamic Programming Methods in Pairwise Alignment
No ratings yet
Dynamic Programming Methods in Pairwise Alignment
41 pages
AsBioinfo-Ders-7-ALLIGNMENT_1
No ratings yet
AsBioinfo-Ders-7-ALLIGNMENT_1
9 pages
4. Sequence Alignment
No ratings yet
4. Sequence Alignment
24 pages
Need & Emergence of The Field: Speaker Shashi Shekhar Head of Computational Section Biowits Life Sciences
No ratings yet
Need & Emergence of The Field: Speaker Shashi Shekhar Head of Computational Section Biowits Life Sciences
59 pages
Introduction-To-Computational Biology
No ratings yet
Introduction-To-Computational Biology
61 pages
Bioinfo-Ders-7-ALLIGNMENT_1
No ratings yet
Bioinfo-Ders-7-ALLIGNMENT_1
55 pages
K Nearest Neighbor Algorithm: Fundamentals and Applications
From Everand
K Nearest Neighbor Algorithm: Fundamentals and Applications
Fouad Sabry
No ratings yet
Even Distribution and Spherical Ball-Packing
From Everand
Even Distribution and Spherical Ball-Packing
Ying-chien Chang
No ratings yet
AP Calculus Flashcards, Fourth Edition: Up-to-Date Review and Practice
From Everand
AP Calculus Flashcards, Fourth Edition: Up-to-Date Review and Practice
Barron's Educational Series
No ratings yet
Homology Modelling Using Ramachandran Plotting
No ratings yet
Homology Modelling Using Ramachandran Plotting
8 pages
AlinhamentosMultiplos 2023-24
No ratings yet
AlinhamentosMultiplos 2023-24
24 pages
Bioinfo PPT Unit 1 Half
No ratings yet
Bioinfo PPT Unit 1 Half
42 pages
Bioinformatics: A Practical Guide to the Analysis of Genes and Proteins 4th Edition Andreas D. Baxevanisdownload
100% (2)
Bioinformatics: A Practical Guide to the Analysis of Genes and Proteins 4th Edition Andreas D. Baxevanisdownload
49 pages
Task 2 - Biodiversity - Evolution - Genetic Variations
No ratings yet
Task 2 - Biodiversity - Evolution - Genetic Variations
7 pages
Muscle: User Guide
No ratings yet
Muscle: User Guide
17 pages
Cipres in Kepler: An Integrative Workflow Package For Streamlining Phylogenetic Data Analyses
No ratings yet
Cipres in Kepler: An Integrative Workflow Package For Streamlining Phylogenetic Data Analyses
16 pages
Bioinformatics Questions based on the exit exam
No ratings yet
Bioinformatics Questions based on the exit exam
7 pages
FASTA Result1
No ratings yet
FASTA Result1
6 pages
Alignment and Phylogenic Tree Construction Procedures.
No ratings yet
Alignment and Phylogenic Tree Construction Procedures.
2 pages
Multiple Sequence Alignment: Sumbitted To: DR - Navneet Choudhary
No ratings yet
Multiple Sequence Alignment: Sumbitted To: DR - Navneet Choudhary
23 pages
List of Online Bioinformatics Tools and Software - Final
No ratings yet
List of Online Bioinformatics Tools and Software - Final
23 pages
Linux Bootcamp Exercises
No ratings yet
Linux Bootcamp Exercises
9 pages
Multiple Sequence Alignment
No ratings yet
Multiple Sequence Alignment
7 pages
BIOINFORMATICS - eNOTES
No ratings yet
BIOINFORMATICS - eNOTES
23 pages
7.93 - Lecture #4 - and - Multiple Sequence Alignment: More Pairwise Sequence Comparisons
No ratings yet
7.93 - Lecture #4 - and - Multiple Sequence Alignment: More Pairwise Sequence Comparisons
44 pages
Bioinformatics Notebook: By: Abdul Hannan Malik
No ratings yet
Bioinformatics Notebook: By: Abdul Hannan Malik
29 pages
Laboratory Manual: Bioinformatics Laboratory (For Private Circulation Only)
No ratings yet
Laboratory Manual: Bioinformatics Laboratory (For Private Circulation Only)
52 pages
Thesis Business It Alignment
100% (3)
Thesis Business It Alignment
7 pages
Cuda Smith Watermaan Speed Up
No ratings yet
Cuda Smith Watermaan Speed Up
7 pages
Instant Download Advances in Computational and Bio-Engineering: Proceeding of the International Conference on Computational and Bio Engineering, 2019, Volume 1 S. Jyothi PDF All Chapters
100% (3)
Instant Download Advances in Computational and Bio-Engineering: Proceeding of the International Conference on Computational and Bio Engineering, 2019, Volume 1 S. Jyothi PDF All Chapters
62 pages
Diploma - Practical
No ratings yet
Diploma - Practical
11 pages
Genomes 2 Cap16
No ratings yet
Genomes 2 Cap16
34 pages
Palladin Protein
No ratings yet
Palladin Protein
4 pages
Experiment 9 Bioinformatics Tools For Cell and Molecular Biology
No ratings yet
Experiment 9 Bioinformatics Tools For Cell and Molecular Biology
11 pages
Molecular Evolutionary Genetics Analysis (MEGA) Software Version 4.0
No ratings yet
Molecular Evolutionary Genetics Analysis (MEGA) Software Version 4.0
4 pages

Bio Lec 4

Uploaded by

Bio Lec 4

Uploaded by

Introduction to

Note: While profiles can be used for any kind of

• Frequencies are calculated from a MSA containing a domain of interest

• Allows us to identify consensus sequence

• Derived scoring scheme allows us to align a new sequence to the

• Profiles also used to compute multiple alignments heuristically

• The score the profile for amino acid a at position p is

• f(p,b) = frequency of amino acid b in position p

• A consensus residue C(p) is generated at each

• The consensus residue c is the amino acid at p that

• c is the amino acid most mutationally similar to all the

• Optimum alignment can be found by dynamic programming

• Spaces are only added to msa – never removed

• Can align profiles to profiles

• Profiles just seen are called average profiles

• Generally perform well, but disregard some of the

• Alternative: Evolutionary profiles

• Idea: Sequences to be aligned are phylogenetically related

• Popular implementations: CLUSTALW, PILEUP, T-Coffee

1. Perform pair-wise alignments between all pairs of sequences

2. Generate distance matrix.

3. Generate a Neighbor-Joining ‘guide tree’ from distance table

4. Use guide tree to progressively align sequences in pairs from

Tree calculated from an alignment of more than 1100 ring finger

2. Varying substitution matrices at different alignment stages

• Local minimum problem:

• With each alignment some proportion of residues are misaligned

• Errors get “locked in” and propagate as sequences are added

• Can result in arbitrary and incorrect alignments

• To avoid local minima, realign subgroups of

• Improves overall alignment score.

You might also like