0% found this document useful (0 votes)
120 views

BLAST - A Heuristic Algorithm

BLAST is a heuristic algorithm for rapidly searching protein and nucleotide databases to find similar sequences. It is a local alignment tool that works in three steps: 1) it compiles high-scoring words from the query sequence, 2) scans the database for these words, and 3) extends any hits in both directions to identify local alignments and determine their statistical significance. Several versions and improvements to BLAST have been developed, including BLAST2 which requires two high-scoring words in a sequence for extension, and gapped BLAST which allows for gaps in alignments.

Uploaded by

Abhishek Dave
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
120 views

BLAST - A Heuristic Algorithm

BLAST is a heuristic algorithm for rapidly searching protein and nucleotide databases to find similar sequences. It is a local alignment tool that works in three steps: 1) it compiles high-scoring words from the query sequence, 2) scans the database for these words, and 3) extends any hits in both directions to identify local alignments and determine their statistical significance. Several versions and improvements to BLAST have been developed, including BLAST2 which requires two high-scoring words in a sequence for extension, and gapped BLAST which allows for gaps in alignments.

Uploaded by

Abhishek Dave
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 18

BLAST A heuristic

algorithm
Anjali Tiwari
Pannaben Patel
Pushkala Venkataraman

Basic Local
Alignment Search
Tool

BLAST

Rapid
Searching of
Protein &
nucleotide DBs

Databa
se
nr = non redundant

Seeking similar
sequences

GenBa
nk
SwissP

nr
PIR

rot
PDB
PRF
3

Program

Search
Level
Blastp
Amino
Amino
Amino
acid
acid
acid
Blastn
Nucleotide Nucleotide Nucleotide
Blastx
Nucleotide
Amino
Amino
acid
acid
Tblastn
Amino
Nucleotide
Amino
BLASTacid
3 STEP ALGORITHM acid
Tblastx Nucleotide Nucleotide
Amino
Compile Words
Scan DB acid

Extend

Query

Database

Some definitions
Alignment

Process of lining up 2
or more sequences to
asses similarity

BLOSUM62

A 20*20 substitution
matrix for amino acids

Gap

Space introduced
into alignment to
compensate for
insertions/deletions
in 1 sequence
relative5to another

Similarity
Measures

Similarity
Matrix - BLOSUM

Local
Search
Algorithms

Identities & Conservative


Replacements = +ve

Unlikely
Replacements = -ve
6

General Concept of working of BLAST

Query Input

1000s of
sequences

Calculate
HSP
Calculate
MSP

MSP Maximal Segment Pair


HSP High Scoring Pair

Display
output
7

Key Idea BLAST1


Compile a list of high scoring words of
length w from query (w=3 for proteins, Step
11 for nucleic acids)
1

Scan for word hits in the database


of score greater than
threshold, T

Extend word hit in


both directions to find High
Scoring Pairs with scores greater
than S
8

Step
2

Step
3

Example
Step -1
Query QQGPHUIQEGQQGKEEDPP
Words of length 3 w = QQG, QGP, GPH, PHU, HUI
Take first triple QQG
Make neighborhood words w = QQG, QEG, GQG
Find high scoring triples Blosum(w, w) > T where T
= Threshold parameter
Suppose Blosum (QQG, QEG) =18
Blosum(QQG,GQG) = 12
Blosum(QQG, QQG)= 16
T=13
Choose QQG and QEG since Blosum Value9> T value

Step -2
Suppose Database Sequence = PKLMMQQGKQEGM

Matching Word Pairs in


DB sequence

10

Step -3
Query
QQGPHUIQEGQQGKEEDPP

Blosum(QQG, QQG)
=16

DB Sequence
QQGPHUIQEGQQGKEEDP
PKLMMQQGKQEGM
Blosum(QQGK, QQGK)
P
=21

PKLMMQQGKQEGM
QQGPHUIQEGQQGKEEDP
Blosum(QQGKE,
P
QQGKQ) =23
QQGPHUIQEGQQGKEEDP
PKLMMQQGKQEGM
Blosum(QQGKEE,
P
QQGKQE) =28
PKLMMQQGKQEGM
QQGPHUIQEGQQGKEEDP
Blosum(QQGKEED,
P
QQGKQEG) =27
11

Extension to the right stops here because


BLOSUM value is beginning to decrease

ADVANTAGES

DISADVANTAGES

Faster than Dynamic Programming


Finds & reports only local
Removes low complexity regions alignments
Spends less time on uninterestingFinds too many word hits per
search
Sequence thus reducing speed
Statistical significance of results can
Does not allow for gaps in seque
be obtained & these are very good

*** New Models to combat disadvantages ***


BLAST2, PSI Blast
12

BLAST2 Combination of 2 Hit &


Gapped
2 Hit Method - 3 Step method
Step 1 and Step 2 as BLAST 1
Step 3 is where they differ BLAST now looks for 2
words in a sequence instead of 1 while aligning. The 2
words are at a distance < A and are not overlapping.
Typically A=40

13

Gapped Blast

Gapped alignment is introduced to get an optimal


alignment
Two sequences:
Seq A = ACGTA
Seq B = ACATA

Normal alignment is
ACGTA
ACATA

But if a penalty of mismatch is larger than


the penalty of gap then the best optimal alignment is as belo
AC-GTA
ACG-TA
ACA-TA

AC-ATA
14

Gapped BLAST - Allows gaps to come


while aligning
Query ATTGTCAAAGACTTGAGCTGATGCAT
DB
GGCAGACATGACTGACAAGGGTATCG
ATTGTCAAAGACTTGAGCTGATGCAT
GGCAGACATGA

CTGACAAGGGTATCG

Mismatch
Gap

15

PSI BLAST-

Position specific iterated


BLAST. Used for multiple alignments

New sequences added


& process iterated

Query Sequence
BLAST search
of DB
Sequences with high
scores collected
Multiple alignment &
profile made
DB searched with
profile16

References
Altschul, S.F., Gish, W., Miller, W., Myers,
E.W. & Lipman, D.J. (1990) "Basic local
alignment search tool." Journal of Molecular
Biology 215:403-410.
Altschul, S.F.,Thomas L.M., Alejandro A.S,
Jinghui Z, Zheng Z, W. Miller & David J.L.
(1997) Gapped BLAST and PSI-BLAST: a
new generation of protein database search
programs. Nucleic Acids Research.
http://www.ncbi.nlm.nih.gov/
http://bioinf.man.ac.uk/ember/prototype/
17

References (Continued)

http://www.psc.edu/biomed/training/tutorials
/sequence/db/index.html
http://aracyc.stanford.edu/~jshrager/jeff/mb
cs/match.html
http://www.ime.usp.br/~durham/cursos/ibi50
32/pub/doc/allignmentTutorial.pdf
http://ibivu.cs.vu.nl/teaching/masters/seq_an
alysis/sa_lecture3.pdf

18

You might also like