0% found this document useful (0 votes)

3 views36 pages

Lecture 05

Uploaded by

Sporkion Suz-beero

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views36 pages

Lecture 05

Uploaded by

Sporkion Suz-beero

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 36

BLAST & Beyond

Computational Biology Lecture 5

Review from last time
!"#$%#&'()*+")(")#,-.'-/(0'1,12(3.#/456"/'$."-7("&8&2
9:&("-;(<"$/(*=(>(&#.#)"$(/*("-;(<"$/(*=(?@A

B4#&(#&(C&'=C)(=*$("-&%'$#-,(&<'+#=#+("-");/#+")(DC'&/#*-&(%4'-(;*C(4"E'(
>("-F(?(#-(4"-F1
Review from last time
!"#$%#&'()*+")(")#,-.'-/(0'1,12(3.#/456"/'$."-7("&8&2
9:&("-;(<"$/(*=(>(&#.#)"$(/*("-;(<"$/(*=(?@A

B4#&(#&(C&'=C)(=*$("-&%'$#-,(&<'+#=#+("-");/#+")(DC'&/#*-&(%4'-(;*C(4"E'(
>("-F(?(#-(4"-F1
Review from last time

Pairwise local alignment (e.g., Smith-Waterman) asks,

“Is any part of A similar to any part of B?”

This is useful for answering specific analytical questions when you have A
and B in hand.
Review from last time
Review from last time
!"#$%#&'()*+")(")#,-.'-/(0'1,12(3.#/456"/'$."-7("&8&2
9:&("-;(<"$/(*=(>(&#.#)"$(/*("-;(<"$/(*=(?@A

B4#&(#&(C&'=C)(=*$("-&%'$#-,(&<'+#=#+("-");/#+")(DC'&/#*-&(%4'-(;*C(4"E'(
>("-F(?(#-(4"-F1

We would frequently prefer to ask a question such as,

“Is any part of A similar to ... anything?”

Where we have sequence A, and “anything” represents a special set of

sequences or a sequence database.
FASTA file

Here is a nucleotide file as represented in a FASTA (or Pearson format) file:

Ø Where does this come from?
Ø Which organism/gene/protein does this belong to?
Ø What other similar sequences exist out there?
For these, we need a database search
In database searching, the basic operation is to sequentially align a query
sequence to each subject sequence in the database. The results are reported as a
ranked hit list followed by a series of individual sequence alignments, plus
various scores and statistics.
National Center for Biotechnology Information (NCBI)

The most widely used interface for the retrieval of information from biological
databases is the NCBI Entrez system. Entrez capitalizes on the fact that there are
preexisting, logical relationships between the individual entries found in numerous
public databases.
Ø Where does this come from?
Ø Which organism/gene/protein does this belong to?
Ø What other similar sequences exist out there?
Ø Where does this come from?
Ø Which organism/gene/protein does this belong to?
Ø What other similar sequences exist out there?
For these, we need a database search
In database searching, the basic operation is to sequentially align a query
sequence to each subject sequence in the database. The results are reported as a
ranked hit list followed by a series of individual sequence alignments, plus
various scores and statistics.

FASTA search:

The first widely used

program for database
similarity searching was
FASTA (Lipman and
Pearson, 1985; Pearson and
Lipman, 1988; Pearson,
2000).
BLAST
BLAST: the Basic Local Alignment Search Tool

Time complexity & the motivation for BLAST.

Tour of the online version of BLAST.

Interpreting BLAST statistics.

91,568 citations
Where to use BLAST USING BLAST

BLAST Website.

Run BLAST on a computer that talks to the internet.

Run BLAST on a computer with a built-in database.

*All three are conceptually the same

https://blast.ncbi.nlm.nih.gov/Blast.cgi USING BLAST
BLAST a protein sequence USING BLAST
Entering data and parameters USING BLAST

Input
query
sequence
Entering data and parameters USING BLAST

Select your
database
Entering data and parameters USING BLAST

Fine-tune
behavior
Entering data and parameters USING BLAST

RUN!
Alignments summary: Best hit! USING BLAST
Alignments summary: Best hit! USING BLAST

This is often your query sequence, provided that it was

already in the BLAST sequence database.
Alignments summary: Another hit USING BLAST
Coverage BLAST STATISTICS

Compare the length of the aligned portion of a sequence to its

original length to determine coverage.
Coverage tells us how global/local the alignment is.

Note: “subject” always refers to a sequence in the database.

Coverage: visualized BLAST STATISTICS

High coverage
of query →

Low coverage
of query →
Identities BLAST STATISTICS

The number of identically paired amino acids in the

alignment (e.g., ‘Val’ over ‘Val’).
Divide Identities by the sequence length, or length of the
aligned portion, to determine percent identity.
Positives (Similarity) BLAST STATISTICS

The number of positively scoring matches and mismatches

in the alignment based on your scoring scheme.

(e.g., “Ile” over “Leu.”)

Positives (Similarity) BLAST STATISTICS

Recall the BLOSUM62 matrix.

Similar amino acids (positives)

have a positive score here.
Score= X bits (Y) BLAST STATISTICS

Y is the sum of the values (costs) for all paired amino acids in
the alignment, minus gap penalties.
BLOSUM62 values are the default.
The bit score X is normalized to be independent of the
scoring system; the details are complicated.
Expect a.k.a. E-value BLAST STATISTICS

Very important! The number of alignments at least this

strong (based on Score) that would be expected by chance.

Expect = Query Length × Database Size × 2–(Bit Score)

≈ p-value when Expect < 0.01
Interpreting BLAST results BLAST STATISTICS

HIGH LOW
similarity similarity

HIGH Your Gene*

Distant
Population
coverage Variants Close
Homologs
Pseudogenes
Homologs

LOW
Shared Domains Dubious
coverage
Other BLAST programs

PROGRAM QUERY DATABASE

blastp protein protein

blastn nucleotide nucleotide

nucleotide
blastx protein
(translated in all 6 reading
frames)
nucleotide
tblastn protein
(translated in all 6 reading frames)
nucleotide nucleotide
tblastx
(translated in all 6 reading (translated in all 6 reading frames)
frames)
blastp is the most commonly used.
blastx is useful for aligning a codon sequence to its product.
(Highlights non-coding DNA; useful in evolutionary analysis.)
SUMMARY
BLAST provides a fast method for searching/aligning a
query sequence against a large sequence database.

The central idea is to quickly filter database sequences

based on small-scale similarity before using DP.

But how does it do it?

BLAST Algorithm (very brief)

Database Similarity Searching
No ratings yet
Database Similarity Searching
4 pages
Microsoft Access Exercise
No ratings yet
Microsoft Access Exercise
2 pages
Centralized and Distributed Database
No ratings yet
Centralized and Distributed Database
25 pages
Mastering BLAST tutorial
No ratings yet
Mastering BLAST tutorial
4 pages
BLAST
No ratings yet
BLAST
30 pages
Social_Media_Fake_Account_Detection_Report_20Pages
No ratings yet
Social_Media_Fake_Account_Detection_Report_20Pages
8 pages
gfs vs hfs
No ratings yet
gfs vs hfs
2 pages
ItoBI Lec10 1
No ratings yet
ItoBI Lec10 1
17 pages
Lecture - 02 - Comparative Sequence Analysis
No ratings yet
Lecture - 02 - Comparative Sequence Analysis
28 pages
Fundamentals of bioinformatics_L5
No ratings yet
Fundamentals of bioinformatics_L5
56 pages
Blast
No ratings yet
Blast
18 pages
Blast Nsuite
No ratings yet
Blast Nsuite
19 pages
blast-170122070200
No ratings yet
blast-170122070200
22 pages
Lecture 9...Basic Local Alignment Tool (BLAST)-1
No ratings yet
Lecture 9...Basic Local Alignment Tool (BLAST)-1
11 pages
DMDW
No ratings yet
DMDW
24 pages
Bio 2
No ratings yet
Bio 2
39 pages
Blast
No ratings yet
Blast
115 pages
LO6 Basic Local Alignment Search Tool
No ratings yet
LO6 Basic Local Alignment Search Tool
10 pages
prathamesh synopsis
No ratings yet
prathamesh synopsis
19 pages
Sequence Alignment
No ratings yet
Sequence Alignment
14 pages
Computers Nursing
No ratings yet
Computers Nursing
2 pages
UNIT IV _ BLAST (1)
No ratings yet
UNIT IV _ BLAST (1)
21 pages
Contact Session 2-1
No ratings yet
Contact Session 2-1
50 pages
Merin 1
No ratings yet
Merin 1
10 pages
Database Management System: Final Team Project Name - Student ID
No ratings yet
Database Management System: Final Team Project Name - Student ID
11 pages
Introduction To Different Resources of Bioinformatics and Application PDF
No ratings yet
Introduction To Different Resources of Bioinformatics and Application PDF
55 pages
BLAST
No ratings yet
BLAST
17 pages
Linear Classifiers in Python: Chapter1
No ratings yet
Linear Classifiers in Python: Chapter1
16 pages
Blast: Background: BLAST Is One of The Most Widely Used Bioinformatics Programs
100% (1)
Blast: Background: BLAST Is One of The Most Widely Used Bioinformatics Programs
4 pages
Lab 2.1
No ratings yet
Lab 2.1
21 pages
Bs982 l08 Basic Blast
No ratings yet
Bs982 l08 Basic Blast
38 pages
L4 Chapter 1
No ratings yet
L4 Chapter 1
11 pages
Some Significant Databases Blast Blast
No ratings yet
Some Significant Databases Blast Blast
18 pages
Artificial Intelligence in Marketing
No ratings yet
Artificial Intelligence in Marketing
9 pages
Huawei Cloud Exam Questions 2023
0% (1)
Huawei Cloud Exam Questions 2023
2 pages
DBMS Practical File
No ratings yet
DBMS Practical File
33 pages
Biology 171L - General Biology Lab I Lab 12: Introduction To Bioinformatics
No ratings yet
Biology 171L - General Biology Lab I Lab 12: Introduction To Bioinformatics
6 pages
second
No ratings yet
second
11 pages
Internal Guide-BE Project Groups 2015-16
No ratings yet
Internal Guide-BE Project Groups 2015-16
28 pages
E-R Diagram in DBMS DBMS Tutorial Studytonight
No ratings yet
E-R Diagram in DBMS DBMS Tutorial Studytonight
8 pages
Lab Report 05
No ratings yet
Lab Report 05
20 pages
Dama
No ratings yet
Dama
2 pages
Cover Letter
No ratings yet
Cover Letter
2 pages
21bce0968 VL2023240100969 Ast02
No ratings yet
21bce0968 VL2023240100969 Ast02
20 pages
IT0011 Final Project Project Proposal
No ratings yet
IT0011 Final Project Project Proposal
5 pages
Bioinformatics: Blast and Sequence Analysis
No ratings yet
Bioinformatics: Blast and Sequence Analysis
45 pages
Bioinformatics Session8
No ratings yet
Bioinformatics Session8
33 pages
Blast
No ratings yet
Blast
28 pages
BLAST Background
100% (1)
BLAST Background
27 pages
BI205 Prac 5&6
No ratings yet
BI205 Prac 5&6
11 pages
Week 3 LocalAlignment
No ratings yet
Week 3 LocalAlignment
25 pages
Sequence DB Search
No ratings yet
Sequence DB Search
38 pages
Lab Report 03
No ratings yet
Lab Report 03
18 pages
Ncbi Blast Name: Rohith ND Roll No:20054
No ratings yet
Ncbi Blast Name: Rohith ND Roll No:20054
11 pages
Data Mining System and Applications A Re
No ratings yet
Data Mining System and Applications A Re
13 pages
Sample Table Employee Employee - Id First - Name Last - Name Salary Joining - Date Department
No ratings yet
Sample Table Employee Employee - Id First - Name Last - Name Salary Joining - Date Department
6 pages
BLAST
100% (1)
BLAST
4 pages
Blast
No ratings yet
Blast
12 pages
An Introduction To NCBI BLAST: Prerequisites Resources
No ratings yet
An Introduction To NCBI BLAST: Prerequisites Resources
23 pages
Comprehensive Guide to BLAST: Definitive Reference for Developers and Engineers
From Everand
Comprehensive Guide to BLAST: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
BE Blast
No ratings yet
BE Blast
11 pages
Answers To Qliksense Interview Questions
No ratings yet
Answers To Qliksense Interview Questions
4 pages
Lab Exercise 1
No ratings yet
Lab Exercise 1
4 pages
Variants of Blast: By-Darshana D Ghadi Roll No. - 03
No ratings yet
Variants of Blast: By-Darshana D Ghadi Roll No. - 03
17 pages
04B. Bioinformatics-Lecture 4 (Alternative) - Blast
100% (1)
04B. Bioinformatics-Lecture 4 (Alternative) - Blast
38 pages
Blast Analisis II
No ratings yet
Blast Analisis II
15 pages
How To Use BLAST
No ratings yet
How To Use BLAST
18 pages
Production of Biodiesel From Vegetable Oils
No ratings yet
Production of Biodiesel From Vegetable Oils
9 pages
Final Blast PDF
No ratings yet
Final Blast PDF
31 pages
H13-611 - V4.5-Enu V9.02
100% (2)
H13-611 - V4.5-Enu V9.02
137 pages
Multi Blast
No ratings yet
Multi Blast
3 pages
Resource Description Framework (RDF)
No ratings yet
Resource Description Framework (RDF)
34 pages
Lecture 4
No ratings yet
Lecture 4
106 pages
Basic Local Alignment
No ratings yet
Basic Local Alignment
36 pages
Data Visualization With Power BI
No ratings yet
Data Visualization With Power BI
49 pages
Blast Introduction
No ratings yet
Blast Introduction
42 pages
Blast
No ratings yet
Blast
6 pages
Bioinformatics Lab 2
No ratings yet
Bioinformatics Lab 2
9 pages
Blast Introduction
No ratings yet
Blast Introduction
42 pages
Using Genbank and BLAST in The Biology Classroom: Matt Wester
No ratings yet
Using Genbank and BLAST in The Biology Classroom: Matt Wester
9 pages
Bioinformatics Lab 2 (Evelyn)
No ratings yet
Bioinformatics Lab 2 (Evelyn)
9 pages
DataStage FAQ
100% (1)
DataStage FAQ
243 pages
Lecture/Lab: BLAST: Materials Last Updated June 2007
No ratings yet
Lecture/Lab: BLAST: Materials Last Updated June 2007
11 pages
Bioinformatics: Arushi Dinesh Kasi Shruthi
No ratings yet
Bioinformatics: Arushi Dinesh Kasi Shruthi
28 pages
ElasticSearch Server
From Everand
ElasticSearch Server
Rafal Kuc
No ratings yet
Blast
100% (1)
Blast
21 pages
Lecture 4: Blast: Ly Le, PHD
No ratings yet
Lecture 4: Blast: Ly Le, PHD
60 pages
SharePoint Interview Questions
No ratings yet
SharePoint Interview Questions
10 pages
Blast (Basic Local Alignment Search Tool)
No ratings yet
Blast (Basic Local Alignment Search Tool)
28 pages
Introduction to Bioinformatics Using Action Labs
From Everand
Introduction to Bioinformatics Using Action Labs
Jean-Louis Lassez
5/5 (1)

Lecture 05

Uploaded by

Lecture 05

Uploaded by

BLAST & Beyond

Computational Biology Lecture 5

Pairwise local alignment (e.g., Smith-Waterman) asks,

We would frequently prefer to ask a question such as,

Where we have sequence A, and “anything” represents a special set of

Here is a nucleotide file as represented in a FASTA (or Pearson format) file:

The first widely used

Time complexity & the motivation for BLAST.

Tour of the online version of BLAST.

Interpreting BLAST statistics.

Run BLAST on a computer that talks to the internet.

Run BLAST on a computer with a built-in database.

*All three are conceptually the same

This is often your query sequence, provided that it was

Compare the length of the aligned portion of a sequence to its

Note: “subject” always refers to a sequence in the database.

The number of identically paired amino acids in the

The number of positively scoring matches and mismatches

(e.g., “Ile” over “Leu.”)

Recall the BLOSUM62 matrix.

Similar amino acids (positives)

Very important! The number of alignments at least this

Expect = Query Length × Database Size × 2–(Bit Score)

HIGH Your Gene*

PROGRAM QUERY DATABASE

blastp protein protein

blastn nucleotide nucleotide

The central idea is to quickly filter database sequences

But how does it do it?

BLAST Algorithm (very brief)

You might also like