0% found this document useful (0 votes)

244 views36 pages

Advanced Gene Sequence Alignment

The document discusses multiple sequence alignment and the ClustalW algorithm. It describes the three main stages of ClustalW as: 1) pairwise alignment to calculate a distance matrix, 2) generating an unrooted Neighbor-Joining guide tree, and 3) progressive alignment according to the guide tree. It also provides details on parameters that can be adjusted, such as gap penalties and scoring matrices, to customize the alignment method for different types of sequences.

Uploaded by

Anwar Ali

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

244 views36 pages

Advanced Gene Sequence Alignment

Uploaded by

Anwar Ali

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 36

Multiple Sequences Alignment

Homology: Definition
Homology: similarity that is the result of inheritance from a common ancestor
Paralogs - related genes within an organism Orthologs genes in other species An Alignment is an hypothesis of positional homology between bases/ amino acids.

Why are multiple sequences alignment used?

Related protein can often provide the likely function, structure, and evolution. Multiple alignment is more sensitive than pairwise alignment to detect homologs. Revealed conserved residues or motifs. Database search effectively perform multiple sequences alignment. The regulatory region of many genes contain consensus sequences for transcription factor-binding site.

Information in Multiple Alignment

Conserved regions Region that are invariant in all the alignment. These usually indicate regions with a specific function. Can be totally or partially conserved. Phylogenetic analysis Tell you which sequences are closest. Sequences are arranged from the most closely related to the most distantly related.

Multiple Sequences Alignment -- Goal

To generate a concise, information-rich summary of sequence data. Used to illustrate the similarity between a group of sequences. Used to illustrate the dissimilarity between a group of sequences. Alignment can be treated as models that can be used to test hypotheses.

Alignment can be easy or difficult

Easy

Difficult : due to the insertions or deletions

The Methods of Multiple Sequences Alignment

Multiple Sequences Alignment - methods

Methods of solving the Multiple Alignment Problem Manual Dynamic Programming Hidden Markov Models (HMMs) Progressive Alignment

Manual Alignment
Alignment is easy. There is some extraneous information. Automated alignment methods have encountered the local minimum problem. An automated alignment method can be improved.

Dynamic Programming Alignment

Dynamic Programming Consider 2 protein sequences of 100 amino acids in length. If it takes 100 seconds to completely align these sequences, it will takes 100 seconds to align 3 sequences, and then 4 sequences etc. It will takes 1.90258x1034 years to align 20 sequences completely.

Limited to a small number of sequences.

Pairwise Alignment
Aligning two sequences : GATTC & GAATTC 1 Scoring: matches: +1 mismatches: 0 indel: -1 1

-1
1 1 1

GATTC GAATTC

Score = 2

GATTC GAATTC

Score = 4

Hidden Markov Models

HMMER was written by Sean Eddy. http://hmmer.wustl.edu Running on UNIX platform. Probabilistic models. Described the likelihood that an amino acid residue occurs at each given position of an alignment. Two main uses search a sequence database with a single profile HMM. search a single query sequence against a library of HMMs.

Progressive Alignment
Devised by Feng and Doolittle in 1987. Heuristic method, as such, is not guaranteed to find the optimal alignment. Based on the pairwise alignment. Most successful implementation is Clustal (by Des Higgins)

ClustalW

ClustalW - Introduction
. General purpose is the comparison or alignment of DNA or protein sequences. . Biologists can study the sequence patterns conserved through evolution and ancestral relationship between different organisms. . Clustalw can be displayed on different operating systems, including: WinXP, UNIX (Linux), Macintosh. . The first Clustal programme (1988) by Des Higgins ClustalV (1992) ClustalW (1994) ClustalX

. The latest version is ClustalW 1.83

ClustalW download & WWW

Download
http://www.imtech.res.in/pub/mirror_sites/ebi/dos/clustalw/ http://iubio.bio.indiana.edu/soft/iubionew/molbio/dna/analysis/ClustalW/ ftp://ftp-igbmc.u-strasbg.fr/pub/ClustalW/ WWW http://www.ebi.ac.uk/clustalw (version 1.83)

Three main stages for ClustalW :

Pairwise alignment: Calculate distance matrix

Unrooted Neighbour-Joining tree Rooted NJ tree (guide tree) and sequence weights

Progressive alignment: Align following the guide tree

Pairwise Alignment
. Pairwise aligns each sequence with every the others
for example: there are n sequences

n(n 1) n C2 2
pairwise alignments were calculated.

. accurate scores from full dynamic programming alignment using 2 gap penalties (opening and extending ) a full amino acid weight matrix
. Each pairwise alignment is completely independent

Calculate distance matrix

Both of the scores (gap penalties and amino acid weight matrix) are initially calculated as per cent identity scores and are converted to distances by dividing by 100 and subtracting from 1.0 to give number of differences per site.

Three main stages for ClustalW :

Pairwise alignment: Calculate distance matrix

Unrooted Neighbour-Joining tree Rooted NJ tree (guide tree) and sequence weights

Progressive alignment: Align following the guide tree

Guide Tree unroot NJ tree

0.17 0.13

Generate a Neighbor-Joining guide tree from these pairwise distance. This guide tree gives the order in which the progressive alignment will be carried out.

Three main stages for ClustlaW :

Pairwise alignment: Calculate distance matrix

Unrooted Neighbour-Joining tree Rooted NJ tree (guide tree) and sequence weights

Progressive alignment: Align following the guide tree

Guide Tree root NJ tree

The weights are dependent upon the distance from the root of the tree but sequences which have a common branch with other sequences share the weight derived from the shared branch.

Three main stages for ClustalW :

Pairwise alignment: Calculate distance matrix Unrooted Neighbour-Joining tree Rooted NJ tree (guide tree) and sequence weights

Progressive alignment: Align following the guide tree

Progressive Alignment

Align the two most closely-related sequences first. This alignment is then fixed and will never change. Once gap, always gap.

Summary
There are three main stages for ClustalW

Higgins D., Thompson J., Gibson T.Thompson J.D., Higgins D.G., Gibson T.J.(1994). Nucleic Acids Res. 22:4673-4680.

ClustalW spends around 96% running time in the first stage for pairwise alignment of the n sequences; and the rest is the running time for second and third stages.

Perform ClustalW alignment

ClustalW

Main menu

Input file

Input File
Prepare the input file sequences should be all in one file there are 7 formats can be accepted : NBRF/PIR, EMBL/Swissport, Fasta, GDE, Clustal, GCG/MSF, RSF

edit the file by Notepad for example :

Fasta is the common

Main Menu
Multiple alignment menu 1. Do complete multiple alignment now (slow/fast) 2. Produce guide tree only 3. Do alignment using old guide tree file 4. Slow / fast pairwise alignment 5. Pairwise alignment parameter 6. Multiple alignment parameter 7. Reset gaps before alignemnt 8. Screen display 9. Output format option 1. Sequence input from disk 2. Multiple alignment 3. Profile / structure alignment

4. Phylogenetic tree
Profile / Structure alignment 1. Input 1st. profile 2. Input 2nd. profile / sequence 3. Align 2nd. profile to 1st. profile 4. Align sequences to 1st. profile Phylogenetic tree 1. Input alignment 2. Exclude position with gaps 3. Correct for multiple substitutions 4. Draw tree now 5. Bootstrap tree

Toggle slow/fast pairwise alignment

Slow/accurate alignment It is fine for short sequences. If sequences>100, length >1000, the speed will be extremely slow full dynamic programming. Fast/approximate alignment how to be fast: - only exactly matching fragments - only the best diagonal

Pairwise Alignment Parameter (1)

Slow alignment:

. Gap Open Penalty: the penalty for opening a gap. (initial gap penalty)
. Gap Extension Penalty: the penalty for extending a gap by 1 residue. ACGTAAATTTTTGG ACGT - - - - - -TTGG
GOP GEP

. Protein Weight Matrix: Gonnet, BLOSUM, PAM

. DNA Weight Matrix: assigned to matches and mismatches

For example: Gonnet BLOSUM PAM Scoring Matrix

Pairwise Alignment Parameters (2)

Fast alignmnet
. K-Tuple Size: the size of exactly matching fragment

increase for speed (max=2 for protein, 4 for DNA); decrease for
sensitivity . Top Diagonals: the number of K-Tuple matches on each diagonal (most matches)

decrease for speed; increase for sensitivity

. Window size: the number of diagonals around each of the best diagonals

decrease for speed; increase for sensitivity

Multiple Alignment Parameter

. increase the Gap Opening Penalty will make gaps less frequent. . increase the Gap Extension Penalty will make gaps shorter. . Delay Divergent Sequences: for delaying the alignment of the most distantly related sequences until most closely related sequences have aligned. . DNA Transition Weight: give the score of AG, CT, between 0 or 1 0 mismatches; 1 matches. for distantly related DNA sequences, the weight is approximately 0 for closely related DNA sequences, the weight has higher score. . Protein Weight Matrix: how similar the sequences to be aligned at this alignment step are.

Output File
CLUSTAL output : [filename].aln

GUIDE TREE : [filename].dnd

Sequence Alignments: Felix Sappelt Irina Wagner
100% (1)
Sequence Alignments: Felix Sappelt Irina Wagner
34 pages
Pairwise Sequence Alignment
No ratings yet
Pairwise Sequence Alignment
12 pages
Sequence Similarity Searching: Basic Local Alignment Search Tool
No ratings yet
Sequence Similarity Searching: Basic Local Alignment Search Tool
47 pages
Molecular Phylogenetics
No ratings yet
Molecular Phylogenetics
4 pages
Lecture 4: Blast: Ly Le, PHD
No ratings yet
Lecture 4: Blast: Ly Le, PHD
60 pages
Sequence Alignment Methods and Algorithms
75% (4)
Sequence Alignment Methods and Algorithms
37 pages
Lecture/Lab: BLAST: Materials Last Updated June 2007
No ratings yet
Lecture/Lab: BLAST: Materials Last Updated June 2007
11 pages
Bioinformatics II Course Overview
No ratings yet
Bioinformatics II Course Overview
91 pages
Bioinformatics for Biochem Students
No ratings yet
Bioinformatics for Biochem Students
6 pages
Comparing DNA Sequences To Understand Evolutionary Relationships With Blast
No ratings yet
Comparing DNA Sequences To Understand Evolutionary Relationships With Blast
3 pages
Exer 5 - BIOINFORMATICS
100% (1)
Exer 5 - BIOINFORMATICS
21 pages
Bioinformatics Lab 2
No ratings yet
Bioinformatics Lab 2
9 pages
Biological Database Overview
No ratings yet
Biological Database Overview
31 pages
Phylogenetic Tree Construction Guide
No ratings yet
Phylogenetic Tree Construction Guide
4 pages
Blast
100% (1)
Blast
21 pages
Bioinformatics Lab 2 (Evelyn)
No ratings yet
Bioinformatics Lab 2 (Evelyn)
9 pages
FASTA
No ratings yet
FASTA
33 pages
Manual PDF
100% (1)
Manual PDF
53 pages
Phylogenetic Tree Building with MEGA
100% (1)
Phylogenetic Tree Building with MEGA
18 pages
BLOSUM 62: Blast vs. FastA Alignment
No ratings yet
BLOSUM 62: Blast vs. FastA Alignment
28 pages
Phylogenetic Tree Lab (FASTA)
No ratings yet
Phylogenetic Tree Lab (FASTA)
8 pages
BLAST
100% (1)
BLAST
4 pages
GWAS
No ratings yet
GWAS
49 pages
Bioinformatics Pratical File
No ratings yet
Bioinformatics Pratical File
63 pages
Molecular Marker
No ratings yet
Molecular Marker
3 pages
PCR Lecture
100% (1)
PCR Lecture
35 pages
DNA Barcoding and Metabarcoding of Standardized Samples Reveal Patterns of Marine Benthic Diversity
No ratings yet
DNA Barcoding and Metabarcoding of Standardized Samples Reveal Patterns of Marine Benthic Diversity
17 pages
Biological Database 1
No ratings yet
Biological Database 1
50 pages
Bioinfo - S1 2021 - L7 - Phylogeny - 1 Slide
100% (1)
Bioinfo - S1 2021 - L7 - Phylogeny - 1 Slide
76 pages
Genomic Technologies in Clinical Diagnostics - Glossary: Term Alignment Allele
No ratings yet
Genomic Technologies in Clinical Diagnostics - Glossary: Term Alignment Allele
7 pages
Bioinformatics Lab Guide
No ratings yet
Bioinformatics Lab Guide
29 pages
Phylogeny Analysis
No ratings yet
Phylogeny Analysis
49 pages
Bioinformatics
No ratings yet
Bioinformatics
18 pages
Molecular Ecology BI214F Exam Spring 2019 PDF
No ratings yet
Molecular Ecology BI214F Exam Spring 2019 PDF
3 pages
Unit 1: Structural Genomics
No ratings yet
Unit 1: Structural Genomics
4 pages
Sequence Analysis &alignment
100% (1)
Sequence Analysis &alignment
2 pages
NGS and Bioinformatics Guide
No ratings yet
NGS and Bioinformatics Guide
5 pages
Browsing Genomes With Ensembl PDF
No ratings yet
Browsing Genomes With Ensembl PDF
105 pages
Primer Design For PCR Assignment
100% (1)
Primer Design For PCR Assignment
5 pages
Exercise 2 Measuring Species Diversity Vegetation
No ratings yet
Exercise 2 Measuring Species Diversity Vegetation
134 pages
Bioinformatics History of Bioinformatics
No ratings yet
Bioinformatics History of Bioinformatics
10 pages
Sequencing Technologies
100% (2)
Sequencing Technologies
25 pages
6 Micro Arrays
100% (1)
6 Micro Arrays
60 pages
Recombinant DNA Technology (Siddra Ijaz, Imran Ul Haq) (Z-Library)
100% (1)
Recombinant DNA Technology (Siddra Ijaz, Imran Ul Haq) (Z-Library)
157 pages
Genbank & BLAST in Biology Class
No ratings yet
Genbank & BLAST in Biology Class
9 pages
Lab Report 2 Bioinformatics
No ratings yet
Lab Report 2 Bioinformatics
17 pages
Species Concept
No ratings yet
Species Concept
9 pages
Phylogenetic Tree
No ratings yet
Phylogenetic Tree
11 pages
PCR Based Molecualr, Genetic Markers
No ratings yet
PCR Based Molecualr, Genetic Markers
59 pages
QPCR Data Analysis Assignment - 33
No ratings yet
QPCR Data Analysis Assignment - 33
5 pages
LSM2241 Practical 4: Introduction To BLAST
No ratings yet
LSM2241 Practical 4: Introduction To BLAST
12 pages
cDNA Libraries
No ratings yet
cDNA Libraries
26 pages
Bioinformatics BLAST Tutorial
No ratings yet
Bioinformatics BLAST Tutorial
3 pages
Phylogenetic Trees Explained
100% (2)
Phylogenetic Trees Explained
20 pages
Emboss (Pairwise Sequence Alignment: Prepared By:-Bansari Patel (19it02) M.Sc. IT (SEM-2
No ratings yet
Emboss (Pairwise Sequence Alignment: Prepared By:-Bansari Patel (19it02) M.Sc. IT (SEM-2
19 pages
Sequencing Depth and Coverage: Key Considerations in Genomic Analyses
No ratings yet
Sequencing Depth and Coverage: Key Considerations in Genomic Analyses
12 pages
Multiple Alignment
No ratings yet
Multiple Alignment
28 pages
Alignments Jmcinerney
No ratings yet
Alignments Jmcinerney
48 pages
Multiple Sequence Alignment
No ratings yet
Multiple Sequence Alignment
89 pages
Analytical
No ratings yet
Analytical
24 pages
TPG 6 3 Plantgenome2013.07.0022 PDF
No ratings yet
TPG 6 3 Plantgenome2013.07.0022 PDF
9 pages
Family Practice 1996 Marshall 522 6
No ratings yet
Family Practice 1996 Marshall 522 6
4 pages
Basic Laboratory Procedures
No ratings yet
Basic Laboratory Procedures
4 pages
Marker Assisted Selection For Crop Improvement by DR P K Gupta Hon Emeritus
No ratings yet
Marker Assisted Selection For Crop Improvement by DR P K Gupta Hon Emeritus
25 pages
Proteomics: Techniques and Applications
No ratings yet
Proteomics: Techniques and Applications
39 pages
A Simple and Sensitive Method To Extract Bacterial, Yeast and Fungal DNA From Blood Culture Material
No ratings yet
A Simple and Sensitive Method To Extract Bacterial, Yeast and Fungal DNA From Blood Culture Material
9 pages
Agricultural Biotechnology
No ratings yet
Agricultural Biotechnology
43 pages
Precision Ag for Southern Farmers
No ratings yet
Precision Ag for Southern Farmers
54 pages
The Assumptions of Anova: Dennis Monday Gary Klein Sunmi Lee
100% (1)
The Assumptions of Anova: Dennis Monday Gary Klein Sunmi Lee
25 pages
Rotavirus and Severe Childhood Diarrhea
No ratings yet
Rotavirus and Severe Childhood Diarrhea
3 pages
Protien Targeting
No ratings yet
Protien Targeting
24 pages
Principles of Design, Probability and Statistics
No ratings yet
Principles of Design, Probability and Statistics
43 pages
NNN
No ratings yet
NNN
27 pages
The Complete Guide To Trigger Points & Myofascial Pain (2016)
50% (4)
The Complete Guide To Trigger Points & Myofascial Pain (2016)
35 pages
Subnautica Databank Name Category Location Found Advanced Theories
No ratings yet
Subnautica Databank Name Category Location Found Advanced Theories
20 pages
An Urgent Warning From DR Carrie Madej
100% (1)
An Urgent Warning From DR Carrie Madej
10 pages
Human Perf. OXFORD PPL
50% (2)
Human Perf. OXFORD PPL
31 pages
The Pet Lovers Club 1: Riley's Terrific Idea: PSST SHH
No ratings yet
The Pet Lovers Club 1: Riley's Terrific Idea: PSST SHH
14 pages
TRUE METRIX GO Owners Booklet RF4TVH03r54 PDF
100% (1)
TRUE METRIX GO Owners Booklet RF4TVH03r54 PDF
66 pages
Group 3 Exercise 10 Population Genetics
No ratings yet
Group 3 Exercise 10 Population Genetics
25 pages
Microcytic Hypochromic Anemia Guide
No ratings yet
Microcytic Hypochromic Anemia Guide
6 pages
Complete Bundle Human Anatomy 8th Edition Martini
No ratings yet
Complete Bundle Human Anatomy 8th Edition Martini
411 pages
Ecosystem Poster - Introduction and Rubric
No ratings yet
Ecosystem Poster - Introduction and Rubric
7 pages
01 Reproduction in Lower and Higher Plant. - by Pradnya Kamble
No ratings yet
01 Reproduction in Lower and Higher Plant. - by Pradnya Kamble
48 pages
PAU - MET Booklet 2019
No ratings yet
PAU - MET Booklet 2019
36 pages
Report and Opinion in The Case of Li Wangyang
No ratings yet
Report and Opinion in The Case of Li Wangyang
28 pages
Introduction To Basic Food Science and Nutrition
No ratings yet
Introduction To Basic Food Science and Nutrition
32 pages
Biology - Activity Based Questions
No ratings yet
Biology - Activity Based Questions
20 pages
Practical Plan HAP
No ratings yet
Practical Plan HAP
3 pages
Vital Face - Facial Exercises and Massage For Health and Beauty (PDFDrive)
No ratings yet
Vital Face - Facial Exercises and Massage For Health and Beauty (PDFDrive)
202 pages
Oid Esp All Eat A Paper Thailand Final
No ratings yet
Oid Esp All Eat A Paper Thailand Final
6 pages
Pediatric CPB Practice Guideline Doc 1.ver 1. Rev 0
No ratings yet
Pediatric CPB Practice Guideline Doc 1.ver 1. Rev 0
54 pages
Infection Control: Past to Present
No ratings yet
Infection Control: Past to Present
5 pages
Vanbergen Et Al 2020
No ratings yet
Vanbergen Et Al 2020
61 pages
Worksheet Answers 2
No ratings yet
Worksheet Answers 2
13 pages
XKCD Best Thesis Defence
100% (2)
XKCD Best Thesis Defence
7 pages
Leaky Gut and Autoimmune Diseases: Alessio Fasano
No ratings yet
Leaky Gut and Autoimmune Diseases: Alessio Fasano
10 pages
Protein Binding
No ratings yet
Protein Binding
85 pages
A Synoptic Key of Materia Medica by C.M.boger-web - (1292 PP)
100% (4)
A Synoptic Key of Materia Medica by C.M.boger-web - (1292 PP)
1,292 pages
Effects of Ethylene On Carnation Flowers (Dianthus Caryophyllus) Cut at Different Stages of Development
No ratings yet
Effects of Ethylene On Carnation Flowers (Dianthus Caryophyllus) Cut at Different Stages of Development
7 pages
Mendelian Inheritance in Corn
No ratings yet
Mendelian Inheritance in Corn
9 pages
Kant S Theory of Value 1st Edition Christoph Horn Instant Download
100% (7)
Kant S Theory of Value 1st Edition Christoph Horn Instant Download
95 pages

Advanced Gene Sequence Alignment

Uploaded by

Advanced Gene Sequence Alignment

Uploaded by

Multiple Sequences Alignment

Why are multiple sequences alignment used?

Information in Multiple Alignment

Multiple Sequences Alignment -- Goal

Alignment can be easy or difficult

Difficult : due to the insertions or deletions

The Methods of Multiple Sequences Alignment

Multiple Sequences Alignment - methods

Dynamic Programming Alignment

Limited to a small number of sequences.

Hidden Markov Models

. The latest version is ClustalW 1.83

ClustalW download & WWW

Three main stages for ClustalW :

Progressive alignment: Align following the guide tree

Calculate distance matrix

Three main stages for ClustalW :

Progressive alignment: Align following the guide tree

Guide Tree unroot NJ tree

Three main stages for ClustlaW :

Progressive alignment: Align following the guide tree

Guide Tree root NJ tree

Three main stages for ClustalW :

Progressive alignment: Align following the guide tree

Perform ClustalW alignment

edit the file by Notepad for example :

Fasta is the common

Toggle slow/fast pairwise alignment

Pairwise Alignment Parameter (1)

. Protein Weight Matrix: Gonnet, BLOSUM, PAM

. DNA Weight Matrix: assigned to matches and mismatches

Pairwise Alignment Parameters (2)

decrease for speed; increase for sensitivity

decrease for speed; increase for sensitivity

Multiple Alignment Parameter

GUIDE TREE : [filename].dnd

You might also like