0% found this document useful (0 votes)

18 views64 pages

Multiple Sequence Alignment Part 1

This document summarizes key concepts from a chapter about multiple sequence alignment (MSA) in bioinformatics. MSA aligns multiple related DNA or protein sequences to identify conserved patterns and motifs. It allows phylogenetic analysis to study evolutionary relationships and can help predict protein structure. The document discusses scoring functions, exhaustive and heuristic MSA algorithms like progressive alignment, and applications like PCR primer design.

Uploaded by

letsvansh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

18 views64 pages

Multiple Sequence Alignment Part 1

Uploaded by

letsvansh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 64

CS-434 Bioinformatics

Dr. Urooj Ainuddin

Multiple
Sequence
Alignment
Chapter 5
Introduction

• A natural extension of pairwise alignment is multiple sequence

alignment (MSA), which is to align multiple related sequences to
achieve optimal matching of the sequences.
• It arranges sequences in such a way that evolutionarily equivalent
positions across all sequences are matched.
Motivation to align multiple sequences

• MSA reveals much more biological information than many pairwise alignments can.
• It allows the identification of conserved sequence patterns and motifs in the
whole sequence family, which are not obvious to detect by comparing only two sequences.
• Many conserved and functionally critical amino acid residues can be identified in a protein
multiple alignment.
• MSA is an essential prerequisite to carrying out phylogenetic analysis of sequence families
and prediction of protein secondary and tertiary structures.
• Multiple sequence alignment also has applications in designing polymerase chain reaction
(PCR) primers based on multiple related sequences.
Phylogenetic analysis

• Phylogenetic analysis
is the study of the
evolutionary
development of a
species or a particular
characteristic of an
organism.
Phylogenetic analysis

• Branching diagrams
are made to represent
the relationship
between different
species, organisms,
or characteristics of
an organism (genes,
proteins, organs, etc.) that
are developed from a
common ancestor.
• The diagram is known as a
phylogenetic tree.
Protein structure

• The complete structure of a protein can be described at four different levels

of complexity:
1. Primary,
2. Secondary,
3. Tertiary,
4. Quaternary.
• Primary structure is defined as the linear amino
acid sequence of a protein's polypeptide chain.
• The term protein sequence is often used
interchangeably with primary structure.

Primary • For example, the primary structure of human

hemoglobin is:
structure
MVHLTPEEKSAVTALWGKVNVDEVGGEALGRLLVVY
PWTQRFFESFGDLSTPDAVMGNPKVKAHGKKVLGAF
SDGLAHLDNLKGTFATLSELHCDKLHVDPENFR
Secondary structure

• Secondary structure is
defined as the local
spatial conformation
of the polypeptide
backbone excluding
the side chains.
Secondary structure

• Secondary
structure elements
common to many
proteins include
α‐helices
and β‐sheets.
Tertiary structure

• Tertiary structure refers to

the 3d arrangement of all
the atoms that constitute a
protein molecule.
• It relates the precise spatial
coordination of secondary
structure elements and the
location of all functional
groups of a single
polypeptide chain.
Quaternary structure

• Quaternary structure is the

3d structure consisting of
the aggregation of two or
more individual
polypeptide chains
(subunits) that operate as a
single functional unit
(multimer).
Polymerase Chain Reaction (PCR)

• Polymerase chain reaction is a method widely used to rapidly make millions to

billions of copies (complete or partial) of a specific DNA sample, allowing scientists
to take a very small sample of DNA and amplify it (or a part of it) to a large enough
amount to study in detail.
• Short DNA sequences called primers are used to select the portion of the genome
to be amplified.
• The temperature of the sample is repeatedly raised and lowered to help a DNA
replication enzyme copy the target DNA sequence.
• Billions of copies of the target sequence are created in just a few hours.
Scoring Function

• MSA is to arrange sequences in such a way that a

maximum number of residues from each sequence are matched
up according to a particular scoring function.
• The scoring function for multiple sequence alignment is based
on the concept of sum of pairs (SP).
• It is the sum of the scores of all possible pairs of sequences in a
multiple alignment based on a particular scoring matrix.
Scoring Function
• In calculating the SP scores, each column is scored by summing
the scores for all possible pairwise matches, mismatches and gap
costs.
• The score of the entire alignment is the sum of all the column
scores.
• The purpose of most multiple sequence alignment algorithms is
to achieve maximum SP scores.
Scoring multiple nucleotide sequences

• We have three sequences: ATT, AT and ACAT.

• We produce the following MSA, where match=2, mismatch=-1, gap penalty=-2:
• Sequence 1 A_TT
• Sequence 2 A_T_
• Sequence 3 ACAT
• The score the first column is 2+2+2=6, that of the second column is 0-2-2=-4, that
of the third column is 2-1-1=0, that of the fourth column is -2-2+2=-2.
• The SP score of this MSA is 6-4+0-2=0.
Scoring multiple amino acid sequences

• We have three sequences: GKN, TRN and SHE.

• We produce the following MSA, using BLOSUM62, gap penalty=-8:
• Sequence 1 GKN
• Sequence 2 TRN
• Sequence 3 SHE
• The score the first column is GT+TS+GS=-2+1+0=-1, that of the second column is
KR+RH+KH=2+0-1=1, that of the third column is NN+NE+NE=6+0+0=6.
• The SP score of this MSA is -1+1+6=6.
• The exhaustive alignment method
involves examining all possible aligned
positions simultaneously.
• For aligning N sequences, a N-
Exhaustive dimensional matrix is needed to be filled
algorithms with alignment scores.
• Back-tracking is applied through the N-
dimensional matrix to find the highest
scored path that represents the optimal
alignment.
Limitations of exhaustive
algorithms
• As the amount of computational time and memory space
required increases exponentially with the number of
sequences, it makes the method computationally
prohibitive to use for a large data set.
• Full dynamic programming is limited to small datasets of
less than ten short sequences.
• Few multiple alignment programs employing DP are
publicly available.
Divide and Conquer Alignment (DCA)
https://bibiserv.cebitec.uni-bielefeld.de/dca

• DCA is a web-based program that is in fact semi-exhaustive because certain

steps of computation are reduced to heuristics.
• It works by breaking each of the sequences into two smaller sections. If the
sections are not short enough, further divisions are carried out.
• When the lengths of the sequences reach a predefined threshold, dynamic
programming is applied for aligning each set of subsequences.
Divide and Conquer Alignment (DCA)
https://bibiserv.cebitec.uni-bielefeld.de/dca

• The resulting short alignments are joined together head to tail to yield a
multiple alignment of the entire length of all sequences.
• It performs global alignment and requires the input sequences to be of
similar lengths and domain structures.
• Despite the use of heuristics, the program is still extremely computationally
intensive and can handle only a very limited number of sequences.
• The heuristic algorithms for MSA fall
into three categories:
Heuristic 1. Progressive alignment,
algorithms
2. Iterative alignment, and
3. Block-based alignment.
Progressive alignment

• Progressive alignment depends on the stepwise assembly of multiple

alignment and is heuristic in nature.
• It speeds up the alignment of multiple sequences through a multistep
process.
Progressive alignment:
First step

• It conducts pairwise
alignments using the
Needleman–Wunsch
method and records
similarity scores
(based on a particular
substitution matrix)
from the pairwise
comparisons.
Progressive alignment:
First step

• The scores are

converted into
evolutionary distances
to generate a distance
matrix.
• The greater the
similarity, the smaller
the distance.
Progressive alignment:
Second step

• A guide tree is
generated from the
distance matrix.
• The tree reflects
evolutionary proximity
among all the
sequences.
Construction of the guide tree:
UPGMA method
• Unweighted Pair Group Method using Arithmetic average (UPGMA) is the
simplest clustering method.
• The basic assumption of the UPGMA method is that all taxa evolve at a
constant rate and that they are equally distant from the root.
• However, real data rarely meet this assumption.
• Thus, UPGMA often produces erroneous tree topologies.
• However, it is a fast method.
A B C D
A -
Construction of the guide tree: B 0.4 -
UPGMA method C 0.35 0.45 -
D 0.6 0.7 0.55 -

0.175
A
U
1. Join the two closest nodes. These are A and C, with distance
0.175
C
0.35.
2. This new node is U.
3. The branch length from U to A or from U to C is 0.35/2 = 0.175.
A B C D
A -
Construction of the guide tree: B 0.4 -
UPGMA method C 0.35 0.45 -
D 0.6 0.7 0.55 -

1. BU = (BA + CB) / 2 = (0.4 + 0.45) / 2. U B D

U -
2. DU = (DA + DC) / 2 = (0.6 + 0.55) / 2. B 0.425 -
3. Join the two closest nodes. These are B and U, D 0.575 0.7 -
with distance 0.425.
0.175
4. This new node is V. U
A
C
5. The branch length from V to B is 0.425/2 = 0.212. V 0.175
B
0.212
A B C D
A -
Construction of the guide tree: B 0.4 -
UPGMA method C 0.35 0.45 -
D 0.6 0.7 0.55 -

V D
1. DV = (DA + DB + DC) / 3 = (0.6 + 0.7 + 0.55) / 3. V -
2. Join D and V. D 0.617 -
0.175
3. This new node is W. A
U
4. The branch length from D to W is 0.617/2 = 0.309. V 0.175
C

5. The guide tree is now complete. W 0.212

D
0.309
A B C D
A -
Construction of the guide tree: B 0.4 -
UPGMA method C 0.35 0.45 -
D 0.6 0.7 0.55 -

• This guide tree will be used in progressive alignment.

• Observe the distance matrix represented by the guide 0.175
A
tree. You can see that it is different from the original
C
matrix. 0.175
A B C D B
A - 0.212
B 0.424 - D
0.309
C 0.35 0.424 -
D 0.618 0.618 0.618 -
Construction of the guide tree:
Neighbor Joining method
• The Neighbor Joining (NJ) method can be used to build a guide tree.
• The NJ method does not assume the taxa to be equidistant from the root.
• It corrects for unequal evolutionary rates between sequences using a
conversion step, which calculates “r-values” and “transformed r-values” using
the following formulae, 𝑛 being the number of nodes:

𝑟𝑖 = ෍ 𝑑𝑖𝑗

𝑟𝑖′ = 𝑟𝑖 /(𝑛 − 2)
Construction of the guide tree:
Neighbor Joining method
• Once the “r-values” and “transformed r-values” are in hand, and 𝑑𝑖𝑗 is the
evolutionary distance between 𝑖 and 𝑗, the converted distance between 𝑖 and 𝑗 is
given as:
′
𝑟𝑖 + 𝑟𝑗
𝑑𝑖𝑗 = 𝑑𝑖𝑗 − = 𝑑𝑖𝑗 − 𝑟𝑖′ − 𝑟𝑗′
𝑛−2
• Before tree construction, all possible nodes are collapsed into a star tree.
• The pair of taxa with the shortest distances in the new matrix are separated from
the star tree first, according to the corrected distances.
Construction of the guide tree:
Neighbor Joining method
while n > 2 {
Calculate 𝑟𝑖 and 𝑟𝑖′ .
′
Calculate 𝑑𝑖𝑗 .
Create the corrected distance matrix.
Join the two nodes which are the closest, S and T. Let the new joined node be X.
Calculate distances from X to S and T.
Calculate distances from X to the remaining nodes and create a reduced matrix using the previous matrix.
}
if n == 2
Join the two nodes and calculate the distance between them.
A B C D
A -
Construction of the guide tree: B 0.4 -
NJ method C 0.35 0.45 -
D 0.6 0.7 0.55 -

• Calculate 𝑟𝑖 and 𝑟𝑖′ .

A B C D
A -
Construction of the guide tree: B 0.4 -
NJ method C 0.35 0.45 -
D 0.6 0.7 0.55 -

• ′
Calculate 𝑑𝑖𝑗 = 𝑑𝑖𝑗 −
𝑟𝑖 +𝑟𝑗
𝑛−2
= 𝑑𝑖𝑗 − 𝑟𝑖′ − 𝑟𝑗′ .
′
• 𝑑𝐴𝐵 = 𝑑𝐴𝐵 − 𝑟𝐴′ − 𝑟𝐵′ = 0.4 − 0.675 − 0.775 = −1.05
′
• 𝑑𝐴𝐶 = 𝑑𝐴𝐶 − 𝑟𝐴′ − 𝑟𝐶′ = 0.35 − 0.675 − 0.675 = −1
′
• 𝑑𝐴𝐷 = 𝑑𝐴𝐷 − 𝑟𝐴′ − 𝑟𝐷′ = 0.6 − 0.675 − 0.925 = −1
′
• 𝑑𝐵𝐶 = 𝑑𝐵𝐶 − 𝑟𝐵′ − 𝑟𝐶′ = 0.45 − 0.775 − 0.675 = −1
′
• 𝑑𝐵𝐷 = 𝑑𝐵𝐷 − 𝑟𝐵′ − 𝑟𝐷′ = 0.7 − 0.775 − 0.925 = −1
′
• 𝑑𝐶𝐷 = 𝑑𝐶𝐷 − 𝑟𝐶′ − 𝑟𝐷′ = 0.55 − 0.675 − 0.925 = −1.05
A B C D
A -
Construction of the guide tree: B 0.4 -
NJ method C 0.35 0.45 -
D 0.6 0.7 0.55 -

• Create the corrected distance matrix.

′
• 𝑑𝐴𝐵 = 𝑑𝐴𝐵 − 𝑟𝐴′ − 𝑟𝐵′ = 0.4 − 0.675 − 0.775 = −1.05 A B C D
′
• 𝑑𝐴𝐶 = 𝑑𝐴𝐶 − 𝑟𝐴′ − 𝑟𝐶′ = 0.35 − 0.675 − 0.675 = −1 A -
′
• 𝑑𝐴𝐷 = 𝑑𝐴𝐷 − 𝑟𝐴′ − 𝑟𝐷′ = 0.6 − 0.675 − 0.925 = −1
′ B -1.05 -
• 𝑑𝐵𝐶 = 𝑑𝐵𝐶 − 𝑟𝐵′ − 𝑟𝐶′ = 0.45 − 0.775 − 0.675 = −1
• ′
𝑑𝐵𝐷 = 𝑑𝐵𝐷 − 𝑟𝐵′ − 𝑟𝐷′ = 0.7 − 0.775 − 0.925 = −1 C -1 -1 -
′
• 𝑑𝐶𝐷 = 𝑑𝐶𝐷 − 𝑟𝐶′ − 𝑟𝐷′ = 0.55 − 0.675 − 0.925 = −1.05 D -1 -1 -1.05 -
A B C D
A -
Construction of the guide tree: B 0.4 -
NJ method C 0.35 0.45 -
D 0.6 0.7 0.55 -

• Join the two nodes which are the closest. We

have two pairs, A and B, and C and D.
A B C D
• We join A and B into a new node called U. A -
• Calculate distances from U to A and B. B -1.05 -
C -1 -1 -
D -1 -1 -1.05 -
A B C D
A -
Construction of the guide tree: B 0.4 -
NJ method C 0.35 0.45 -
D 0.6 0.7 0.55 -

• Join the two nodes which are the closest. We

have two pairs, A and B, and C and D.
A
• We join A and B into a new node called U. U
B
• Calculate distances from U to A and B.
A B C D
A -
Construction of the guide tree: B 0.4 -
NJ method C 0.35 0.45 -
D 0.6 0.7 0.55 -

• The new cluster allows the construction of a

reduced matrix.
U C D
• This starts with distances from the initial matrix. U -
• Calculate distances from U to the remaining nodes. C 0.2 -
D 0.45 0.55 -
U C D
U -
Construction of the guide tree: C 0.2 -
NJ method D 0.45 0.55 -

• Calculate 𝑟𝑖 and 𝑟𝑖′ .

U C D
U -
Construction of the guide tree: C 0.2 -
NJ method D 0.45 0.55 -

• ′
Calculate 𝑑𝑖𝑗 = 𝑑𝑖𝑗 −
𝑟𝑖 +𝑟𝑗
𝑛−2
= 𝑑𝑖𝑗 − 𝑟𝑖′ − 𝑟𝑗′ .
′
• 𝑑𝐶𝑈 = 𝑑𝐶𝑈 − 𝑟𝐶′ − 𝑟𝑈′ = 0.2 − 0.75 − 0.65 = −1.2
′
• 𝑑𝐷𝑈 = 𝑑𝐷𝑈 − 𝑟𝐷′ − 𝑟𝑈′ = 0.45 − 1 − 0.65 = −1.2
′
• 𝑑𝐶𝐷 = 𝑑𝐶𝐷 − 𝑟𝐶′ − 𝑟𝐷′ = 0.55 − 0.75 − 1 = −1.2
U C D
U -
Construction of the guide tree: C 0.2 -
NJ method D 0.45 0.55 -

• Create the corrected distance matrix.

′
• 𝑑𝐶𝑈 = 𝑑𝐶𝑈 − 𝑟𝐶′ − 𝑟𝑈′ = 0.2 − 0.75 − 0.65 = −1.2 U C D
′
• 𝑑𝐷𝑈 = 𝑑𝐷𝑈 − 𝑟𝐷′ − 𝑟𝑈′ = 0.45 − 1 − 0.65 = −1.2 U -
′
• 𝑑𝐶𝐷 = 𝑑𝐶𝐷 − 𝑟𝐶′ − 𝑟𝐷′ = 0.55 − 0.75 − 1 = −1.2 C -1.2 -
D -1.2 -1.2 -
U C D
U -
Construction of the guide tree: C 0.2 -
NJ method D 0.45 0.55 -

• Join the two nodes which are the closest. All

pairs have the same corrected distance. We pick
U C D
the pair, C and U.
U -
• We join C and U into a new node called V. C -1.2 -
• Calculate distances from V to C and U. D -1.2 -1.2 -
U C D
U -
Construction of the guide tree: C 0.2 -
NJ method D 0.45 0.55 -

• Join the two nodes which are the closest. All

pairs have the same corrected distance. We pick
the pair, C and U. U A

• We join C and U into a new node called V. V

B
C
• Calculate distances from V to C and U.
U C D
U -
Construction of the guide tree: C 0.2 -
NJ method D 0.45 0.55 -

• The new cluster allows the construction of a reduced

matrix.
V D
• This starts with distances from the previous matrix. V -
• Calculate distance from D to the remaining node. D 0.4 -
V D
V -
Construction of the guide tree: D 0.4 -
NJ method

• Because D is the last branch to be decomposed from the

star tree, we do not calculate 𝑟𝑖 and 𝑟𝑖′ as 𝑟𝑖′ is infinitely
large when n − 2 = 0.
U A
• Without 𝑟𝑖′ , we cannot calculate 𝑑𝑖𝑗′ . B
V
• Join the two nodes, D and V. C

• The guide tree is now complete. D

V D
V -
Construction of the guide tree: D 0.4 -
NJ method

• This guide tree will be used in progressive alignment.

• Observe the distance matrix represented by the guide
tree. You can see that it is the same as the original matrix. U A
B
A B C D V
C
A -
B 0.4 - D
C 0.35 0.45 -
D 0.6 0.7 0.55 -
Progressive alignment:
Third step

• According to the guide

tree, the two most
closely related
sequences are first re-
aligned using the
Needleman–Wunsch
algorithm.
Progressive alignment:
Third step

• To align additional
sequences, the two
already aligned
sequences are
converted to a
consensus sequence
with fixed gap
positions.
Progressive alignment:
Third step

• The consensus is then

treated as a single
sequence for the next
alignment.
• More distant
sequences are added
in accordance with
their relative positions
on the guide tree.
Progressive alignment:
Third step

• After realignment
with a new sequence
using dynamic
programming, a new
consensus is derived.
• The process is
repeated until all the
sequences are aligned.
Clustal Omega
https://www.ebi.ac.uk/Tools/msa/clustalo/

• Probably the most well-known progressive alignment program is Clustal.

• Clustal is a progressive multiple alignment program available either as a
standalone or online program.
• Clustal Omega has the widest variety of operating systems out of all the
Clustal tools.
Clustal Omega
https://www.ebi.ac.uk/Tools/msa/clustalo/
• Clustal does not rely on a single substitution matrix.
• Instead, it applies different scoring matrices when aligning sequences, depending
on degrees of similarity.
• The choice of a matrix depends on the evolutionary distances measured from the
guide tree.
• For example, for closely related sequences that are aligned in the initial steps,
Clustal automatically uses the BLOSUM62 or PAM120 matrix. When more
divergent sequences are aligned in later steps of the progressive alignment, the
BLOSUM45 or PAM250 matrices may be used instead.
Clustal Omega
https://www.ebi.ac.uk/Tools/msa/clustalo/

• Another feature of Clustal is the use of adjustable gap penalties that allow
more insertions and deletions in regions that are outside the conserved
domains, but fewer in conserved regions.
• In addition, gaps that are too close to one another can be penalized more
than gaps occurring in isolated loci.
Drawbacks of progressive alignment

• The progressive alignment method is not suitable for comparing sequences

of different lengths because it is based on global alignment.
• As a result of the use of affine gap penalties, long gaps are not allowed, and,
in some cases, this may limit the accuracy of the method. (Affine gap penalty
is calculated as A+BL, where A is the gap opening penalty, B is the gap
extension penalty and L is the length of the gap.)
• The final alignment result is influenced by the order of sequence addition.
Drawbacks of progressive alignment

• A major limitation is the dependence of the algorithm on pairwise alignment.

• Once gaps introduced in the early steps of alignment, they are fixed.
• Any errors made in these steps cannot be corrected and can propagate
throughout the entire alignment.
• The final alignment could be far from optimal.
• The problem can be more glaring when dealing with divergent sequences.
Homework A B C D E
A -

• For the given distance B 7 -

matrix, form guide C 15 9 -
trees using both
clustering algorithms D 11 7 12 -
discussed in this E 16 8 7 11 -
material.

Multiple Sequence Alignment Methods And Protocols Kazutaka Katoh pdf download
No ratings yet
Multiple Sequence Alignment Methods And Protocols Kazutaka Katoh pdf download
85 pages
Full Computational Biology A Hypertextbook 1st Edition Scott Theodore Kelley PDF All Chapters
100% (3)
Full Computational Biology A Hypertextbook 1st Edition Scott Theodore Kelley PDF All Chapters
62 pages
BioInformatics Quiz1 Week1
No ratings yet
BioInformatics Quiz1 Week1
6 pages
L8 Msa
No ratings yet
L8 Msa
52 pages
MULTIPLE SEQUENCE ALIGNMENT (1)
No ratings yet
MULTIPLE SEQUENCE ALIGNMENT (1)
18 pages
Msa Notes
No ratings yet
Msa Notes
10 pages
Lecture 7: Multiple Sequence Alignment (MSA) What Is Multiple Sequence Alignment?
No ratings yet
Lecture 7: Multiple Sequence Alignment (MSA) What Is Multiple Sequence Alignment?
6 pages
Unit 3 Bioinformatics
No ratings yet
Unit 3 Bioinformatics
11 pages
04-Alinemiento Múltiple de Secuencias
No ratings yet
04-Alinemiento Múltiple de Secuencias
14 pages
Chapter 6 Multiple Sequence Alignment 2022 Bioinformatics For Everyone
No ratings yet
Chapter 6 Multiple Sequence Alignment 2022 Bioinformatics For Everyone
7 pages
Msa
No ratings yet
Msa
28 pages
Multiple Sequence Alignment
No ratings yet
Multiple Sequence Alignment
19 pages
Multiple Sequence Alignment
No ratings yet
Multiple Sequence Alignment
89 pages
(Methods in Molecular Biology, 2231) Kazutaka Katoh - Multiple Sequence Alignment - Methods and Protocols-Humana (2020)
No ratings yet
(Methods in Molecular Biology, 2231) Kazutaka Katoh - Multiple Sequence Alignment - Methods and Protocols-Humana (2020)
322 pages
Dr. Zoya Khalid Zoya - Khalid@nu - Edu.pk
No ratings yet
Dr. Zoya Khalid Zoya - Khalid@nu - Edu.pk
51 pages
Note 7 - Group 7 Scribbing
No ratings yet
Note 7 - Group 7 Scribbing
7 pages
MultipleSequenceAlignment_2021_PDF
No ratings yet
MultipleSequenceAlignment_2021_PDF
5 pages
Bioinformatics Lesson 05
No ratings yet
Bioinformatics Lesson 05
13 pages
_second_done_w15_16_a_Multiple sequence alignment
No ratings yet
_second_done_w15_16_a_Multiple sequence alignment
36 pages
Notes Bioinformatics
No ratings yet
Notes Bioinformatics
14 pages
Sequence Analysis in Bioinformatics
No ratings yet
Sequence Analysis in Bioinformatics
18 pages
Multiple Sequence Alignment 3
No ratings yet
Multiple Sequence Alignment 3
22 pages
sequence allignment
No ratings yet
sequence allignment
5 pages
Multiple Sequence Alignment Black and White
No ratings yet
Multiple Sequence Alignment Black and White
2 pages
Multiple Sequence Alignment
No ratings yet
Multiple Sequence Alignment
6 pages
Sequence Alignments: Felix Sappelt Irina Wagner
100% (1)
Sequence Alignments: Felix Sappelt Irina Wagner
34 pages
Importance and Significance of Sequence Alignment.pptx12
No ratings yet
Importance and Significance of Sequence Alignment.pptx12
15 pages
Lec7 - Multiple Sequence Alignment
No ratings yet
Lec7 - Multiple Sequence Alignment
22 pages
05. Sequence Alignment
No ratings yet
05. Sequence Alignment
9 pages
msa_MTech
No ratings yet
msa_MTech
17 pages
Multiple Sequence Alignment (MSA)
No ratings yet
Multiple Sequence Alignment (MSA)
78 pages
Chapter 7 Multiple Alignment
No ratings yet
Chapter 7 Multiple Alignment
6 pages
A Genetic Algorithm Based Approach for The
No ratings yet
A Genetic Algorithm Based Approach for The
4 pages
3
No ratings yet
3
107 pages
Lec4 - Multiple Sequence Alignment
No ratings yet
Lec4 - Multiple Sequence Alignment
22 pages
Multiple Alignment
No ratings yet
Multiple Alignment
28 pages
A survey on the algorithm and development of multiple sequence alignment
No ratings yet
A survey on the algorithm and development of multiple sequence alignment
16 pages
Multiple Sequence Alignment: Hamid Hamzeiy Izmir Institute of Technology
No ratings yet
Multiple Sequence Alignment: Hamid Hamzeiy Izmir Institute of Technology
6 pages
4.1
No ratings yet
4.1
36 pages
Lab 3 - Multiple Sequence Alignment: Bioinformatic Methods I Lab 3
No ratings yet
Lab 3 - Multiple Sequence Alignment: Bioinformatic Methods I Lab 3
14 pages
Analytical
No ratings yet
Analytical
24 pages
BioinfoMethods-I Lab03 r2025 - Copy
No ratings yet
BioinfoMethods-I Lab03 r2025 - Copy
14 pages
Lecture 5: Multiple Sequence Alignment: Introduction To Computational Biology
No ratings yet
Lecture 5: Multiple Sequence Alignment: Introduction To Computational Biology
34 pages
Lec (5) - MSA
No ratings yet
Lec (5) - MSA
23 pages
Multiple Sequence Alignment
No ratings yet
Multiple Sequence Alignment
7 pages
Multiple Sequence Alignments
No ratings yet
Multiple Sequence Alignments
9 pages
ANALYSIS OF PROTEIN SEQUENCE ALIGNMENT AND PHYLOGENETIC TREE CONSTRUCTION
No ratings yet
ANALYSIS OF PROTEIN SEQUENCE ALIGNMENT AND PHYLOGENETIC TREE CONSTRUCTION
9 pages
Multiple Sequence Alignment MSA
No ratings yet
Multiple Sequence Alignment MSA
8 pages
MANISHA MINOR PROJECT Edit
No ratings yet
MANISHA MINOR PROJECT Edit
21 pages
36) Corpet 1988
No ratings yet
36) Corpet 1988
10 pages
Bioinformatics Practical Part Iii
No ratings yet
Bioinformatics Practical Part Iii
4 pages
Align 2
No ratings yet
Align 2
29 pages
Ploy BBB
No ratings yet
Ploy BBB
13 pages
Sequence Alignment - Final
No ratings yet
Sequence Alignment - Final
6 pages
Lecture 4
No ratings yet
Lecture 4
21 pages
Multiple Sequence Alignment Thesis
100% (3)
Multiple Sequence Alignment Thesis
8 pages
Multiple Alignment PDF
No ratings yet
Multiple Alignment PDF
45 pages
Classical Approach to Constrained and Unconstrained Molecular Dynamics
From Everand
Classical Approach to Constrained and Unconstrained Molecular Dynamics
Ajith Gunaratne
No ratings yet
Radial Basis Networks: Fundamentals and Applications for The Activation Functions of Artificial Neural Networks
From Everand
Radial Basis Networks: Fundamentals and Applications for The Activation Functions of Artificial Neural Networks
Fouad Sabry
No ratings yet
Design And Analysis Of Algorithm
From Everand
Design And Analysis Of Algorithm
Bhupendra Mandloi
No ratings yet
Applied Iterative Methods
From Everand
Applied Iterative Methods
Louis A. Hageman
No ratings yet
Backpropagation: Fundamentals and Applications for Preparing Data for Training in Deep Learning
From Everand
Backpropagation: Fundamentals and Applications for Preparing Data for Training in Deep Learning
Fouad Sabry
No ratings yet
Bioinformatics Assignment 4
No ratings yet
Bioinformatics Assignment 4
7 pages
Transcriptome Data of The Carrageenophyte Eucheuma Denticulatum
No ratings yet
Transcriptome Data of The Carrageenophyte Eucheuma Denticulatum
5 pages
BC BioinformaticsTools
No ratings yet
BC BioinformaticsTools
29 pages
Bio4213 Lab 1
No ratings yet
Bio4213 Lab 1
8 pages
Bioinformatics
No ratings yet
Bioinformatics
10 pages
Bi205: Genetics & Evolution: Bioinformatics 1 & 2
No ratings yet
Bi205: Genetics & Evolution: Bioinformatics 1 & 2
14 pages
Assignment 5-Phylogeny-15 Nov. 2024
No ratings yet
Assignment 5-Phylogeny-15 Nov. 2024
2 pages
T Coffee - Overview
No ratings yet
T Coffee - Overview
102 pages
Chapter 5 Pairwise Alignment
No ratings yet
Chapter 5 Pairwise Alignment
8 pages
Fnyyh Fo'Ofo - Ky : Date-Sheet For B.Sc. (Hons) Choice Based Credit System (CBCS) Pa
No ratings yet
Fnyyh Fo'Ofo - Ky : Date-Sheet For B.Sc. (Hons) Choice Based Credit System (CBCS) Pa
12 pages
Bio-Perl: S B Mirza 1314 Bioinformatics 7 Semester (A.n)
No ratings yet
Bio-Perl: S B Mirza 1314 Bioinformatics 7 Semester (A.n)
13 pages
M2 Genomics Informatics and Mathematics For Health and Environment Université Paris-Saclay
No ratings yet
M2 Genomics Informatics and Mathematics For Health and Environment Université Paris-Saclay
3 pages
Health Informatics Ii
No ratings yet
Health Informatics Ii
38 pages
Homology modeling
No ratings yet
Homology modeling
5 pages
Multiple Sequence Alignment Using Clustal W.: Theory
No ratings yet
Multiple Sequence Alignment Using Clustal W.: Theory
9 pages
CourseCurriculum (6)
No ratings yet
CourseCurriculum (6)
3 pages
Phylogenetic Analysis - A Bioinformatics Tool
100% (6)
Phylogenetic Analysis - A Bioinformatics Tool
32 pages
PDBefold Tutorial
No ratings yet
PDBefold Tutorial
14 pages
The International HapMap Project Web Site
No ratings yet
The International HapMap Project Web Site
3 pages
MCQ Bio
No ratings yet
MCQ Bio
6 pages
Bif401 Solved Final Papers 2017
No ratings yet
Bif401 Solved Final Papers 2017
8 pages
FreeBayes variant calling workflow for DNA-Seq - Bioinformatics Workbook
No ratings yet
FreeBayes variant calling workflow for DNA-Seq - Bioinformatics Workbook
9 pages
FASTA and BLAST
No ratings yet
FASTA and BLAST
2 pages
Genomic Data Preprocessing Through Different Libraries
No ratings yet
Genomic Data Preprocessing Through Different Libraries
30 pages
B.Tech in Computational Biology
No ratings yet
B.Tech in Computational Biology
3 pages
Rani Anak Mat Case 4 Report
No ratings yet
Rani Anak Mat Case 4 Report
5 pages
COMPUTSCI-2024-09-26-20-33-55
No ratings yet
COMPUTSCI-2024-09-26-20-33-55
102 pages

Multiple Sequence Alignment Part 1

Uploaded by

Multiple Sequence Alignment Part 1

Uploaded by

CS-434 Bioinformatics

Dr. Urooj Ainuddin

• A natural extension of pairwise alignment is multiple sequence

• The complete structure of a protein can be described at four different levels

Primary • For example, the primary structure of human

• Tertiary structure refers to

• Quaternary structure is the

• Polymerase chain reaction is a method widely used to rapidly make millions to

• MSA is to arrange sequences in such a way that a

• We have three sequences: ATT, AT and ACAT.

• We have three sequences: GKN, TRN and SHE.

• DCA is a web-based program that is in fact semi-exhaustive because certain

• Progressive alignment depends on the stepwise assembly of multiple

• The scores are

1. BU = (BA + CB) / 2 = (0.4 + 0.45) / 2. U B D

5. The guide tree is now complete. W 0.212

• This guide tree will be used in progressive alignment.

• Calculate 𝑟𝑖 and 𝑟𝑖′ .

• Create the corrected distance matrix.

• Join the two nodes which are the closest. We

• Join the two nodes which are the closest. We

• The new cluster allows the construction of a

• Calculate 𝑟𝑖 and 𝑟𝑖′ .

• Create the corrected distance matrix.

• Join the two nodes which are the closest. All

• Join the two nodes which are the closest. All

• We join C and U into a new node called V. V

• The new cluster allows the construction of a reduced

• Because D is the last branch to be decomposed from the

• The guide tree is now complete. D

• This guide tree will be used in progressive alignment.

• According to the guide

• The consensus is then

• Probably the most well-known progressive alignment program is Clustal.

• The progressive alignment method is not suitable for comparing sequences

• A major limitation is the dependence of the algorithm on pairwise alignment.

• For the given distance B 7 -

You might also like