Blast Tips
Blast Tips
CLC bio Finlandsgade 10-12 8200 Aarhus N Denmark Telephone: +45 70 22 55 09 Fax: +45 70 22 55 19 www.clcbio.com [email protected]
Tutorial
Following through these sections of the tutorial requires some experience using the Workbench, so if you get stuck at some point, we recommend going through the more basic tutorials first.
Human chromosome 11 (NC_000011) consists of 134452384 nucleotides and the beta-globin (AAA16334) protein has 147 amino acids. BLAST configuration Next, conduct a local BLAST search: Toolbox | BLAST Search ( ) | Local BLAST ( )
Select the protein sequence as query sequence and click Next. Since you wish to BLAST a protein sequence against a nucleotide sequence, use tblastn which will automatically translate the nucleotide sequence selected as database. As Target select NC_000011 that you downloaded. If you are used to BLAST, you will know that you usually have to create a BLAST database before BLASTing, but the Workbench does this "on the fly" when you just select one or more sequences. Click Next, leave the parameters at their default, click Next again, and then Finish. Inspect BLAST result When the BLAST result appears make a split view so that both the table and graphical view is visible (see figure 1). This is done by pressing Ctrl ( on Mac) while clicking the table view ( ) at the bottom of the view. In the table start out by showing two additional columns; "% Positive" and "Query start". These should simply be checked in the Side Panel. P. 2
Now, sort the BLAST table view by clicking the column header "% Positive". Then, press and hold the Ctrl button ( on Mac) and click the header "Query start". Now you have sorted the table first on % Positive hits and then the start position of the query sequence. Now you see that you actually have three regions with a 100% positive hit but at different locations on the chromosome sequence (see figure 1).
Tutorial
Figure 1: Placement of translated nucleotide sequence hits on the Human beta-globin. Why did we find, on the protein level, three identical regions between our query protein sequence and nucleotide database? The beta-globin gene is known to have three exons and this is exactly what we find in the BLAST search. Each translated exon will hit the corresponding sequence on the chromosome. If you place the mouse cursor on the sequence hits in the graphical view, you can see the reading frame which is -1, -2 and -3 for the three hits, respectively. Verify the result Open NC_000011 in a view, and go to the Hit start position (5,204,729) and zoom to see the blue gene annotation. You can now see the exon structure of the Human beta-globin gene showing the three exons on the reverse strand (see figure 2). If you wish to verify the result, make a selection covering the gene region and open it in a new view: right-click | Open Selection in New View ( ) | Save ( ) P. 3
Tutorial
Figure 2: Human beta-globin exon view. Save the sequence, and perform a new BLAST search: Use the new sequence as query. Use BLASTx Use the protein sequence, AAA16334, as database Using the genomic sequence as query, the mapping of the protein sequence to the exons is visually very clear as shown in figure 3. In theory you could use the chromosome sequence as query, but the performance would not be optimal: it would take a long time, and the computer might run out of memory. In this example, you have used well-annotated sequences where you could have searched for the name of the gene instead of using BLAST. However, there are other situations where you either do not know the name of the gene, or the genomic sequence is poorly annotated. In these cases, the approach described in this tutorial can be very productive.
Tutorial
Figure 3: Verification of the result: at the top a view of the whole BLAST result. At the bottom the same view is zoomed in on exon 3 to show the amino acids. These settings are shown in figure 5.
Further reading
A valuable source of information about BLAST can be found at http://blast.ncbi.nlm.nih. gov/Blast.cgi?CMD=Web&PAGE_TYPE=BlastDocs&DOC_TYPE=ProgSelectionGuide. Remember that BLAST is a heuristic method. This means that certain assumptions are made to allow searches to be done in a reasonable amount of time. Thus you cannot trust BLAST search results to be accurate. For very accurate results you should consider using other algorithms, such as Smith-Waterman. You can read "Bioinformatics explained: BLAST versus Smith-Waterman" here: http://www.clcbio.com/BE.
P. 5
Tutorial
P. 6