0% found this document useful (0 votes)
170 views

Basic Local Alignment

BLAST (Basic Local Alignment Search Tool) is a program that compares a query sequence to sequence databases and identifies homologous sequences. It produces local alignments where only a portion of each sequence needs to be aligned. BLAST uses statistical analysis to determine if matches could occur by chance. Key outputs of BLAST include a graphical overview of matches, a hit list with accession numbers and descriptions of matching sequences along with E-values and scores, and pairwise sequence alignments. The lower the E-value, the more significant the match. Users must analyze BLAST results, including evaluating alignments, to draw their own biological conclusions about similarity and homology between sequences.

Uploaded by

Zeeshan Khan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
170 views

Basic Local Alignment

BLAST (Basic Local Alignment Search Tool) is a program that compares a query sequence to sequence databases and identifies homologous sequences. It produces local alignments where only a portion of each sequence needs to be aligned. BLAST uses statistical analysis to determine if matches could occur by chance. Key outputs of BLAST include a graphical overview of matches, a hit list with accession numbers and descriptions of matching sequences along with E-values and scores, and pairwise sequence alignments. The lower the E-value, the more significant the match. Users must analyze BLAST results, including evaluating alignments, to draw their own biological conclusions about similarity and homology between sequences.

Uploaded by

Zeeshan Khan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 36

Basic Local Alignment Search

Tool
Anum Munir
MS Bioinformatics
[email protected]
Basic Local Alignment Search
Tool
BLAST

Why Use BLAST?


Concepts of Sequence Similarity
Searching
• One sequence by itself is not
informative;
• it must be analyzed by comparative
methods against existing sequence
databases to develop hypothesis
concerning relatives and function.

3
BLAST
• Basic Local Alignment Search Tool
• Calculates similarity for biological
sequences.
• Produces local alignments: only a portion
of each sequence must be aligned.
• Uses statistical theory to determine if a
match might have occurred by chance.

4
5
BLAST is a heuristic.
• A lookup table is made of all the “words”
(short subsequences) and “neighboring”
words in the query sequence.
• The database is scanned for matching
words (“hot spots”).
• Gapped and un-gapped extensions are
initiated from these matches.

6
7
Finding Model Organisms for Study
of Disease
Can yeast be used as a model
organism to study cystic fibrosis?

8
Model Organisms
• Cystic fibrosis is a genetic disorder
that affects humans
– If yeast contain a protein that is related
(homologous) to the protein involved in
cystic fibrosis
– Then yeast can be used as a model
organism to study this disease
• Study of the protein in yeast will tell us about
the function of the protein in humans

9
BLAST helps you to find
homologous genes and proteins

Homologous Proteins (or genes)

• Have a common ancestor (they’re related)


• Have similar structures
• Have similar functions

10
Criteria for considering two
sequences to be homologous
• Proteins are homologous if
– Their amino acid sequences are at least
25% identical

• DNA sequences are homologous if


– they are at least 70% identical

– Note that sequences must be over 100 a.a.


(or bp) in length

11
Whenever possible, it is better
to compare proteins
than to compare genes
What does BLAST do?
BLAST compares sequences
• BLAST takes a query sequence
• Compares it with millions of sequences in the
Genbank databases
– By constructing local alignments
• Lists those that appear to be similar to the query
sequence
– The “hit list”
• Tells you why it thinks they are homologs
– BLAST makes suggestions
– YOU make the conclusions

14
How do I input a query into
BLAST?
Choose which “flavor” of BLAST to
use
• BLAST comes in many “flavors”
– Protein BLAST (BLASTp)
• Compares a protein query with sequences in
GenBank protein database

– Nucleotide BLAST (BLASTn)


• Compare nucleotide query with sequences in
GenBank nucleotide database

16
Choose which “flavor” of BLAST
– blastx
• Compares a nucleotide query sequence translated in all
reading frames against a protein sequence database.
• You could use this option to find potential translation
products of an unknown nucleotide sequence.
– Tblastn
• Compares a protein query sequence against a
nucleotide sequence database dynamically translated
in all reading frames.
– tblastx
• Compares the six-frame translations of a nucleotide query
sequence against the six-frame translations of a nucleotide
sequence database
17
more BLAST programs
Program Notes
Contiguous Nearly identical sequences

Megablast
Discontiguous Cross-species comparison

Automatically generates a position


PSI-BLAST specific score matrix (PSSM)
Position
Specific Searches a database of PSI-BLAST
RPS-BLAST PSSMs

18
Enter your “query” sequence
• A sequence can be input as a (an)
– FASTA format sequence
– Accession number

– Protein blast can only accept amino acid


sequences

19
Choose search set
• Choose which database to search
– Default is non-redundant protein
sequences (nr)
• Searches all databases that contain protein
sequences

20
Choose organism
• Default is all organisms represented in
databases

• Use this to limit your search to one


organism (eg. Yeast)

21
BLAST off!!
• Click on the BLAST button at the
bottom of the page!

22
How do I interpret the results of a
BLAST search?
BLAST creates local alignments
• What is a local alignment?
– BLAST looks for similarities between
regions of two sequences

24
The BLAST output then
describes how these aligned
regions are similar
• How long are the aligned segments?
• Did BLAST have to introduce gaps in order to
align the segments?
• How similar are the aligned segments?

25
The BLAST Output
Graphical Overview

27
The Graphic Display
1. How good is the match?
• Red = excellent!
• Pink = pretty good
• Green = OK, but look at other factors
• Blue = bad
• Black = really bad!

2. How long are the matched segments?


Longer = better
28
The hit list
• BLAST lists the best matches (hits)
– For each hit, BLAST provides:
• Accession number – links to Genbank flatfile
• Description
• “G” = genome link
• E-value
– An indicator of how good a match to the query
sequence
• Score
– Link to an alignment

29
Pair-wise alignments

30
What is an E-value?
• E-value
– The chance that the match could be
random

– The lower the E-value, the more significant


the match
• E = 10-4 is considered the cutoff point
• E = 0 means that the two sequences are
statistically identical

31
What is an E-value?
• The quality of the alignment is represented
by the Score (S).
• The score of an alignment is calculated as the sum of substitution
and gap scores. Substitution scores are given by a look-up table
(PAM, BLOSUM) whereas gap scores are assigned empirically .
• The significance of each alignment is
computed as an E value (E).
• Expectation value. The number of different alignments with scores
equivalent to or better than S that are expected to occur in a
database search by chance. The lower the E value, the more
significant the score
32
What is an E-value?
• The E-value is not a probability;
• it’s an expect value
• The BLAST programs report E-value rather than P-values
because it is easier to understand the difference between,
• for example, E-value of 5 and 10 than P-values of 0.993 and
0.99995.
• However, when E < 0.01, P-values and E-value are nearly
identical.

33
Most people use the E- value
as their first indication of
similarity!
The Alignment
• Look for:
– Long regions of alignment
– With few gaps
– % identity should be >25% for proteins
• (>70% for DNA)

35
BLAST makes suggestions,
You draw the conclusions!

• Look at E-value
• Look at graphic display
• If necessary, look at alignment

• Make your best guess!

36

You might also like