Lecture 05
Lecture 05
B4#&(#&(C&'=C)(=*$("-&%'$#-,(&<'+#=#+("-");/#+")(DC'&/#*-&(%4'-(;*C(4"E'(
>("-F(?(#-(4"-F1
Review from last time
!"#$%#&'()*+")(")#,-.'-/(0'1,12(3.#/456"/'$."-7("&8&2
9:&("-;(<"$/(*=(>(&#.#)"$(/*("-;(<"$/(*=(?@A
B4#&(#&(C&'=C)(=*$("-&%'$#-,(&<'+#=#+("-");/#+")(DC'&/#*-&(%4'-(;*C(4"E'(
>("-F(?(#-(4"-F1
Review from last time
This is useful for answering specific analytical questions when you have A
and B in hand.
Review from last time
Review from last time
!"#$%#&'()*+")(")#,-.'-/(0'1,12(3.#/456"/'$."-7("&8&2
9:&("-;(<"$/(*=(>(&#.#)"$(/*("-;(<"$/(*=(?@A
B4#&(#&(C&'=C)(=*$("-&%'$#-,(&<'+#=#+("-");/#+")(DC'&/#*-&(%4'-(;*C(4"E'(
>("-F(?(#-(4"-F1
The most widely used interface for the retrieval of information from biological
databases is the NCBI Entrez system. Entrez capitalizes on the fact that there are
preexisting, logical relationships between the individual entries found in numerous
public databases.
National Center for Biotechnology Information (NCBI)
The most widely used interface for the retrieval of information from biological
databases is the NCBI Entrez system. Entrez capitalizes on the fact that there are
preexisting, logical relationships between the individual entries found in numerous
public databases.
Ø Where does this come from?
Ø Which organism/gene/protein does this belong to?
Ø What other similar sequences exist out there?
Ø Where does this come from?
Ø Which organism/gene/protein does this belong to?
Ø What other similar sequences exist out there?
For these, we need a database search
In database searching, the basic operation is to sequentially align a query
sequence to each subject sequence in the database. The results are reported as a
ranked hit list followed by a series of individual sequence alignments, plus
various scores and statistics.
FASTA search:
BLAST Website.
Input
query
sequence
Entering data and parameters USING BLAST
Select your
database
Entering data and parameters USING BLAST
Fine-tune
behavior
Entering data and parameters USING BLAST
RUN!
Alignments summary: Best hit! USING BLAST
Alignments summary: Best hit! USING BLAST
High coverage
of query →
Low coverage
of query →
Identities BLAST STATISTICS
Y is the sum of the values (costs) for all paired amino acids in
the alignment, minus gap penalties.
BLOSUM62 values are the default.
The bit score X is normalized to be independent of the
scoring system; the details are complicated.
Expect a.k.a. E-value BLAST STATISTICS
HIGH LOW
similarity similarity
LOW
Shared Domains Dubious
coverage
Other BLAST programs
nucleotide
blastx protein
(translated in all 6 reading
frames)
nucleotide
tblastn protein
(translated in all 6 reading frames)
nucleotide nucleotide
tblastx
(translated in all 6 reading (translated in all 6 reading frames)
frames)
blastp is the most commonly used.
blastx is useful for aligning a codon sequence to its product.
(Highlights non-coding DNA; useful in evolutionary analysis.)
SUMMARY
BLAST provides a fast method for searching/aligning a
query sequence against a large sequence database.