Basic Local Alignment
Basic Local Alignment
Tool
Anum Munir
MS Bioinformatics
[email protected]
Basic Local Alignment Search
Tool
BLAST
3
BLAST
• Basic Local Alignment Search Tool
• Calculates similarity for biological
sequences.
• Produces local alignments: only a portion
of each sequence must be aligned.
• Uses statistical theory to determine if a
match might have occurred by chance.
4
5
BLAST is a heuristic.
• A lookup table is made of all the “words”
(short subsequences) and “neighboring”
words in the query sequence.
• The database is scanned for matching
words (“hot spots”).
• Gapped and un-gapped extensions are
initiated from these matches.
6
7
Finding Model Organisms for Study
of Disease
Can yeast be used as a model
organism to study cystic fibrosis?
8
Model Organisms
• Cystic fibrosis is a genetic disorder
that affects humans
– If yeast contain a protein that is related
(homologous) to the protein involved in
cystic fibrosis
– Then yeast can be used as a model
organism to study this disease
• Study of the protein in yeast will tell us about
the function of the protein in humans
9
BLAST helps you to find
homologous genes and proteins
10
Criteria for considering two
sequences to be homologous
• Proteins are homologous if
– Their amino acid sequences are at least
25% identical
11
Whenever possible, it is better
to compare proteins
than to compare genes
What does BLAST do?
BLAST compares sequences
• BLAST takes a query sequence
• Compares it with millions of sequences in the
Genbank databases
– By constructing local alignments
• Lists those that appear to be similar to the query
sequence
– The “hit list”
• Tells you why it thinks they are homologs
– BLAST makes suggestions
– YOU make the conclusions
14
How do I input a query into
BLAST?
Choose which “flavor” of BLAST to
use
• BLAST comes in many “flavors”
– Protein BLAST (BLASTp)
• Compares a protein query with sequences in
GenBank protein database
16
Choose which “flavor” of BLAST
– blastx
• Compares a nucleotide query sequence translated in all
reading frames against a protein sequence database.
• You could use this option to find potential translation
products of an unknown nucleotide sequence.
– Tblastn
• Compares a protein query sequence against a
nucleotide sequence database dynamically translated
in all reading frames.
– tblastx
• Compares the six-frame translations of a nucleotide query
sequence against the six-frame translations of a nucleotide
sequence database
17
more BLAST programs
Program Notes
Contiguous Nearly identical sequences
Megablast
Discontiguous Cross-species comparison
18
Enter your “query” sequence
• A sequence can be input as a (an)
– FASTA format sequence
– Accession number
19
Choose search set
• Choose which database to search
– Default is non-redundant protein
sequences (nr)
• Searches all databases that contain protein
sequences
20
Choose organism
• Default is all organisms represented in
databases
21
BLAST off!!
• Click on the BLAST button at the
bottom of the page!
22
How do I interpret the results of a
BLAST search?
BLAST creates local alignments
• What is a local alignment?
– BLAST looks for similarities between
regions of two sequences
24
The BLAST output then
describes how these aligned
regions are similar
• How long are the aligned segments?
• Did BLAST have to introduce gaps in order to
align the segments?
• How similar are the aligned segments?
25
The BLAST Output
Graphical Overview
27
The Graphic Display
1. How good is the match?
• Red = excellent!
• Pink = pretty good
• Green = OK, but look at other factors
• Blue = bad
• Black = really bad!
29
Pair-wise alignments
30
What is an E-value?
• E-value
– The chance that the match could be
random
31
What is an E-value?
• The quality of the alignment is represented
by the Score (S).
• The score of an alignment is calculated as the sum of substitution
and gap scores. Substitution scores are given by a look-up table
(PAM, BLOSUM) whereas gap scores are assigned empirically .
• The significance of each alignment is
computed as an E value (E).
• Expectation value. The number of different alignments with scores
equivalent to or better than S that are expected to occur in a
database search by chance. The lower the E value, the more
significant the score
32
What is an E-value?
• The E-value is not a probability;
• it’s an expect value
• The BLAST programs report E-value rather than P-values
because it is easier to understand the difference between,
• for example, E-value of 5 and 10 than P-values of 0.993 and
0.99995.
• However, when E < 0.01, P-values and E-value are nearly
identical.
33
Most people use the E- value
as their first indication of
similarity!
The Alignment
• Look for:
– Long regions of alignment
– With few gaps
– % identity should be >25% for proteins
• (>70% for DNA)
35
BLAST makes suggestions,
You draw the conclusions!
• Look at E-value
• Look at graphic display
• If necessary, look at alignment
36