100% found this document useful (1 vote)
199 views

BLAST (Basic Local Alignment Search Tool)

Sequence alignment involves comparing two or more sequences to identify regions of similarity. There are two main types of alignment: global alignment aims to align the full sequences while local alignment focuses on similar stretches. Common methods for sequence alignment include dot matrix analysis and dynamic programming algorithms like the Smith-Waterman algorithm. BLAST is a widely used tool for comparing queries to databases to identify homologous sequences.
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
199 views

BLAST (Basic Local Alignment Search Tool)

Sequence alignment involves comparing two or more sequences to identify regions of similarity. There are two main types of alignment: global alignment aims to align the full sequences while local alignment focuses on similar stretches. Common methods for sequence alignment include dot matrix analysis and dynamic programming algorithms like the Smith-Waterman algorithm. BLAST is a widely used tool for comparing queries to databases to identify homologous sequences.
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 23

Sequence Alignment

• Sequence alignment is the procedure of comparing


two (pair‐wise alignment)

or

more/ multiple sequences (multiple sequence


alignment) by searching for a series of individual
characters or patterns that are in the same order in
the sequences.
• There are two types of alignment:
local and global.
In global alignment, an attempt is made to align
the entire sequence. If two sequences have
approximately the same length and are quite
similar, they are suitable for the global
alignment.
• Local alignment concentrates on finding
stretches of sequences with high level of
matches.
Methods of sequence alignment
• Dot matrix analysis
• The dynamic programming (DP) algorithm
• Word or k‐tuple methods
Match +2, Mismatch 0, gap -1
ATCG
TCG

G 0 0 3 6

C 0 1 4 3

T 0 2 1, 0

A 0 0 0 0

- 0 0 0 0

0 - T C G
Match +2, mismatch 0, gap -1
ATCG
T –GC
AT- CG
TGC

G 0 0 3 2

C 0 1 2 3

T 0 2 1 0

A 0 0 0 0

- 0 0 0 0

0 - T G C
A-GCT A-CGT
Smith- AGC-T ACGT
AGCT

Watermann
T 0 0 2 3 4
algorithm
G 0 0 3 2 3

C 0 1 2 3 2

A 0 2 1 0 0

- 0 0 0 0 0

0 - A G C T
 

Indels
X-A-T-G-C-C-A-C-X X-A-T-G-C-C-A-C-X X-A-T-G-C-C-A-C-X
X-A-T-G-C-C-A-C-X X-A-T-C-C-A-C-X X-A-T-_-C-C-A-C-X
Gap inserted to achieve
identical deleted G optimal (identical)
alignment

X-A-T-G-C-C-A-C-X X-A-T-G-C-T-C-A-C-X X-A-T-G-C-T-C-A-C-X


X-A-T-G-C-C-A-C-X X-A-T-G-C-C-A-C-X X-A-T-G-C-_-C-A-C-X
Gap inserted to achieve
identical inserted T optimal (identical)
alignment
• BLAST (Basic Local Alignment Search Tool) is a
search tool for retrieving informations from
database
• the software package BLAST is Used to detect
homologies or sequence similarities in two or
more nucleotide sequences
• BLAST is an algorithm for the comparison of
nucleotide or amino acid sequences
Program Type of Type of Comparison Application
query seq database

BLAST2 DNA or DNA or Comparison between two sequences Find level of seq
protein protein similarity or
identity between
the input seq
BLAST DNA DNA Compares NA query seq against NA Find DNA seq
N database that match the
query
BLAST P Protein Protein Compares protein query seq against Find identical or
protein database homologous
proteins
BLAST DNA Protein Compares Translated NA seq against a Find what
X protein sequence database protein the query
seq codes for
T BLAST Protein DNA Compares aa query seq against Find genes in
N translation of all possible reading unknown DNA
frames of seqs in NA database sequence
Pairwise Alignment Multiple sequence alignment

PA Compares two biological seqs of Compares three or more seqs of....


either protein, NA
PA Can be categorized as global or MSA is generally a global alignment
local alignment methods
PA is used to find out conserved Detection of regions of variability or
regions between two seqs conservation in a family of proteins
Searches for similarity in a database Phylogenetic analysis
Examples of tools: BLAST, Examples of tools: MUSCLE
EMBOSS-Water (Local), EMBOSS- (MUltiple Sequence Comparison
Needle (Global) (EMBOSS- by Log- Expectation)
European Molecular Biology Open S CLUSTALW
oftware Suite) T-Coffee (Tree-based Consistency
Objective Function For alignment
Evaluation)
What do the colours represent in protein alignments?

Residue Colour Property


AVFPMILW RED Small (small+
hydrophobic
(incl.aromatic -Y))
DE BLUE Acidic
RK MAGENTA Basic - H
STYHCNGQ GREEN Hydroxyl +
sulfhydryl + amine +
G
Others Grey Unusual amino/imino
acids etc
What do consensus symbols represent in a Multiple Sequence
Alignment?

An * (asterisk) indicates positions which have a single, fully conserved residue. 

A : (colon) indicates conservation between groups of strongly similar properties


as below - roughly equivalent to scoring > 0.5 in the Gonnet PAM 250 matrix:

STA
NEQK
NHQK
NDEQ
QHRK
MILV
MILF
HY
FYW 
• A . (period) indicates conservation between groups of weakly similar
properties as below - roughly equivalent to scoring =< 0.5 and > 0 in the
Gonnet PAM 250 matrix: 

CSA
ATV
SAG
STNK
STPA
SGND
SNDEQK
NDEQHK
NEQHRK
FVLIM
HFY
• ATGCAT
• ATCG-AT match +1 mismatch -2 gap 0

• +1+1-2-2+1+1=0
• +1+1+0+1+0+1+1=5
• AT-GCAT
• ATCG-AT
• indels

You might also like