0% found this document useful (0 votes)
52 views6 pages

Blast Tips

This tutorial document provides guidance on customizing BLAST search parameters for specialized searches. It discusses how to: 1) Use BLAST to locate a protein sequence on a chromosome by mapping a human beta-globin protein to chromosome 11. 2) Adjust BLAST settings to search for short primer sequences by changing word size and expect value parameters. 3) Modify settings like word size, complexity filtering and scoring matrix to find remote protein homologs over long evolutionary distances.
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
52 views6 pages

Blast Tips

This tutorial document provides guidance on customizing BLAST search parameters for specialized searches. It discusses how to: 1) Use BLAST to locate a protein sequence on a chromosome by mapping a human beta-globin protein to chromosome 11. 2) Adjust BLAST settings to search for short primer sequences by changing word size and expect value parameters. 3) Modify settings like word size, complexity filtering and scoring matrix to find remote protein homologs over long evolutionary distances.
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Tutorial

Tutorial: Tips for specialized BLAST searches


June 21, 2011

CLC bio Finlandsgade 10-12 8200 Aarhus N Denmark Telephone: +45 70 22 55 09 Fax: +45 70 22 55 19 www.clcbio.com [email protected]

Tutorial: Tips for specialized BLAST searches

Tutorial: Tips for specialized BLAST searches


Here, you will learn how to: Use BLAST to find the gene coding for a protein in a genomic sequence. Find primer binding sites on genomic sequences Identify remote protein homologues.

Tutorial

Following through these sections of the tutorial requires some experience using the Workbench, so if you get stuck at some point, we recommend going through the more basic tutorials first.

Locate a protein sequence on the chromosome


If you have a protein sequence but want to see the actual location on the chromosome this is easy to do using BLAST. In this example we wish to map the protein sequence of the Human beta-globin protein to a chromosome. We know in advance that the beta-globin is located somewhere on chromosome 11. Data used in this example can be downloaded from GenBank: Search | Search for Sequences at NCBI ( )

Human chromosome 11 (NC_000011) consists of 134452384 nucleotides and the beta-globin (AAA16334) protein has 147 amino acids. BLAST configuration Next, conduct a local BLAST search: Toolbox | BLAST Search ( ) | Local BLAST ( )

Select the protein sequence as query sequence and click Next. Since you wish to BLAST a protein sequence against a nucleotide sequence, use tblastn which will automatically translate the nucleotide sequence selected as database. As Target select NC_000011 that you downloaded. If you are used to BLAST, you will know that you usually have to create a BLAST database before BLASTing, but the Workbench does this "on the fly" when you just select one or more sequences. Click Next, leave the parameters at their default, click Next again, and then Finish. Inspect BLAST result When the BLAST result appears make a split view so that both the table and graphical view is visible (see figure 1). This is done by pressing Ctrl ( on Mac) while clicking the table view ( ) at the bottom of the view. In the table start out by showing two additional columns; "% Positive" and "Query start". These should simply be checked in the Side Panel. P. 2

Tutorial: Tips for specialized BLAST searches

Now, sort the BLAST table view by clicking the column header "% Positive". Then, press and hold the Ctrl button ( on Mac) and click the header "Query start". Now you have sorted the table first on % Positive hits and then the start position of the query sequence. Now you see that you actually have three regions with a 100% positive hit but at different locations on the chromosome sequence (see figure 1).

Tutorial
Figure 1: Placement of translated nucleotide sequence hits on the Human beta-globin. Why did we find, on the protein level, three identical regions between our query protein sequence and nucleotide database? The beta-globin gene is known to have three exons and this is exactly what we find in the BLAST search. Each translated exon will hit the corresponding sequence on the chromosome. If you place the mouse cursor on the sequence hits in the graphical view, you can see the reading frame which is -1, -2 and -3 for the three hits, respectively. Verify the result Open NC_000011 in a view, and go to the Hit start position (5,204,729) and zoom to see the blue gene annotation. You can now see the exon structure of the Human beta-globin gene showing the three exons on the reverse strand (see figure 2). If you wish to verify the result, make a selection covering the gene region and open it in a new view: right-click | Open Selection in New View ( ) | Save ( ) P. 3

Tutorial: Tips for specialized BLAST searches

Tutorial

Figure 2: Human beta-globin exon view. Save the sequence, and perform a new BLAST search: Use the new sequence as query. Use BLASTx Use the protein sequence, AAA16334, as database Using the genomic sequence as query, the mapping of the protein sequence to the exons is visually very clear as shown in figure 3. In theory you could use the chromosome sequence as query, but the performance would not be optimal: it would take a long time, and the computer might run out of memory. In this example, you have used well-annotated sequences where you could have searched for the name of the gene instead of using BLAST. However, there are other situations where you either do not know the name of the gene, or the genomic sequence is poorly annotated. In these cases, the approach described in this tutorial can be very productive.

BLAST for primer binding sites


You can adjust the BLAST parameters so it becomes possible to match short primer sequences against a larger sequence. Then it is easy to examine whether already existing lab primers can be reused for other purposes, or if the primers you designed are specific. Purpose Standard BLAST Primer search Program blastn blastn Word size 11 7 Low complexity filter On Off Expect value 10 1000

These settings are shown in figure 4.

Finding remote protein homologues


If you look for short identical peptide sequences in a database, the standard BLAST parameters will have to be reconfigured. Using the parameters described below, you are likely to be able to identify whether antigenic determinants will cross react to other proteins. Purpose Program Word size Low complexity filter Expect value Scoring matrix Standard BLAST blastp 3 On 10 BLSUM62 Remote homologues blastp 2 Off 20000 PAM30 P. 4

Tutorial: Tips for specialized BLAST searches

Tutorial
Figure 3: Verification of the result: at the top a view of the whole BLAST result. At the bottom the same view is zoomed in on exon 3 to show the amino acids. These settings are shown in figure 5.

Further reading
A valuable source of information about BLAST can be found at http://blast.ncbi.nlm.nih. gov/Blast.cgi?CMD=Web&PAGE_TYPE=BlastDocs&DOC_TYPE=ProgSelectionGuide. Remember that BLAST is a heuristic method. This means that certain assumptions are made to allow searches to be done in a reasonable amount of time. Thus you cannot trust BLAST search results to be accurate. For very accurate results you should consider using other algorithms, such as Smith-Waterman. You can read "Bioinformatics explained: BLAST versus Smith-Waterman" here: http://www.clcbio.com/BE.

P. 5

Tutorial: Tips for specialized BLAST searches

Tutorial

Figure 4: Settings for searching for primer binding sites.

Figure 5: Settings for searching for remote homologues.

P. 6

You might also like