0% found this document useful (0 votes)

3 views

Practical 2 sequence alignment

The document outlines a practical class on microbial bioinformatics focusing on sequence analysis and alignment using various bioinformatics tools. Students will perform local and global alignments, computational translation of coding sequences, and multiple sequence alignments, specifically with Bacillus thuringiensis endotoxins. Key tasks include using BLAST for database searches, translating coding sequences with Expasy, and conducting alignments with EMBOSS Needle and ClustalW.

Uploaded by

oshanthiperera602

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views

Practical 2 sequence alignment

Uploaded by

oshanthiperera602

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 8

MAM 5108: Microbial Bioinformatics

1 Dr. Pasan Fernando

Practical 2: Sequence analysis and alignment

In this practical class, you will learn how to perform local and global alignments using
bioinformatics tools. Furthermore, you will perform computational translation of coding
sequences and multiple sequence alignments. Again, you will work with Bacillus thuringiensis
(Bt) endotoxins.

1. Using the Basic Local Alignment Search Tool (BLAST) for performing a database search
in GenBank (20 marks)

BLAST is the most popular bioinformatics program to conduct local alignment between
sequences. It is widely used to retrieve similar sequences for an input sequence from
biological databases.

Go to the National Center for Biotechnology Information (NCBI) BLAST tool by

performing a google search or using the following link:
https://blast.ncbi.nlm.nih.gov/Blast.cgi

Select the Nucleotide BLAST program from the program list. Then, copy and paste the
“Unknown_cry_cds.fasta” sequence given to you, including the header, into the query
sequence box. Alternatively, you can upload the FASTA file by clicking on the “choose
file” button. After you copy and paste the sequence, the FASTA header will be
automatically added to “Job title” field. You can give another title if required.

Then, under the “Choose search set” section, select “Nucleotide collection” as the
standard database (default option). Under the “program selection” section, select
“Highly similar sequences (megablast)” as the program optimization method.

Finally, tick the box to show results in a new window and click on the “BLAST” button to
run the program. You can run the program using default algorithm parameters.
However, for advanced search queries, you can change these parameters by expanding
on the “Algorithm parameters” box. The BLAST program will take a few minutes
depending on the query and connection speed and generate the results in a new
window. Make sure that you have a good network connection while running the
program.

Answer the following questions based on the BLAST results.

a. According to the top hits, what are the gene name and the organism (including
subspecies) of the input sequence? (2 marks)

1
MAM 5108: Microbial Bioinformatics
2 Dr. Pasan Fernando

b. Give two synonyms to the gene name of the input sequence (use knowledge
from your previous lab). (2 marks)

c. Based on the Max Score of the BLAST hits, are there multiple hits with the
highest Max Score? If yes, how many hits? And what is the highest Max Score? (3
marks)

d. What can you conclude from the Query Cover and the E-values of the hit(s) you
found in part (c)? (4 marks)

e. What can you conclude about the homology of hit(s) you found in part (c) based
on the Percent Identity value(s)? (3 marks)

f. What can you conclude from the accession length of the hit(s) you found in part
(c)? (2 marks)

g. What could be the reason for finding multiple hits with the same highest Max
Score? (2 marks)

h. By considering all the results what would be your pick as the best hit? Write its
GenBank accession number and explain the reason for your selection. (2 marks)

2
MAM 5108: Microbial Bioinformatics
3 Dr. Pasan Fernando

2. Using the Expert Protein Analysis System (Expasy) Translate tool to translate a coding
sequence of a gene. (5 marks)

You can use the Expasy translate tool to translate a coding sequence (without introns)
or an mRNA sequence to the corresponding amino acid sequence.

Go to the Expasy translate tool by clicking on the given link or googling it.
https://web.expasy.org/translate/

Copy the sequence in the “Unknown_cry_cds.fasta” file and paste it into the DNA or
RNA sequence box. Select the following parameter settings and click on the translate
button.
 Output format: Compact
 DNA strands: both forward and reverse
 Genetic codes: standard

a. In the results, select the open reading frame with the longest continuous amino
acid sequence. What is this reading frame for the sequence? What is the reason
for selecting the longest continuous sequence? (3 marks)

b. Then, click on the starting methionine residue (letter “M” in red color) of the
longest amino acid sequence. This will give you two views for the selected
sequence: Pseudo-entry and FASTA format. What is the length of this amino acid
sequence? (2 marks)

3
MAM 5108: Microbial Bioinformatics
4 Dr. Pasan Fernando

c. Now, click on the download button on top of the Pseudo-entry result and select
the FASTA format to download. Then change the resulting file format from “.txt”
into “.fasta” by changing the file extension.

3. Using the NCBI blastx program to search protein databases using a translated nucleotide
query. (10 marks)

With the blastx program, you do not need to separately translate your coding sequence
or mRNA sequence. The program will take the nucleotide sequence as the input and
translate it in 6 open reading frames and search for the best protein hit. Use the blastx
program to search for the best protein hit for the “Unknown_cry_cds.fasta” coding
sequence.

Go to the NCBI BLAST using the following link or a google search. Then, select the blastx
search.
https://blast.ncbi.nlm.nih.gov/Blast.cgi

Click on choose file button under the “enter query sequence” section and upload the
“Unknown_cry_cds.fasta” file you saved before. Alternatively, you can also copy and
paste the sequence into the sequence box. Then, make sure the UniprotKB/Swiss-Prot is
selected as the database. Tick on “show results in a new window” and click on the BLAST
button.

a. What is the UniProt ID of the best hit? What are the reasons for selecting this
UniProt record as the best hit? Explain using result metrics such as Max Score, E-
value, etc. (8 marks)

4
MAM 5108: Microbial Bioinformatics
5 Dr. Pasan Fernando

b. Access the UniProt record for the best hit. What are the name and the function
of the protein according to the UniProt record? (2 marks)

c. Finally, download the amino acid sequence in FASTA format from the UniProt
record.

4. Performing a pairwise global alignment using the Needleman-Wunsch alignment

algorithm. (7 marks)

EMBOSS Needle is an online tool that reads two input sequences and writes their
optimal global sequence alignment to a file. It uses the Needleman-Wunsch alignment
algorithm to find the optimum alignment (including gaps) of two sequences along their
entire length. You can access EMBOSS Needle using the following link:
https://www.ebi.ac.uk/Tools/psa/emboss_needle/

In this exercise, you will use the EMBOSS Needle to perform a global alignment on the
translated amino acid sequence you generated in 2(c) and the amino acid sequence you
downloaded from the UniProt record in 3(c). Go to the EMBOSS Needle and copy and
paste the two sequences in the sequence boxes. Make sure “protein” is selected as the
molecule type and click on submit button at the bottom. You can also enter your email
address for notifications by ticking the corresponding box.

Answer the following questions based on the alignment result.

a. Write the identity, similarity and gap percentages, and final score for the
alignment. (4 marks)

b. What can you conclude about the alignment of the two sequences from the
above results? (1 mark)

5
MAM 5108: Microbial Bioinformatics
6 Dr. Pasan Fernando

c. What does this indicate about the Expasy translate prediction you performed
during question 2? (2 marks)

5. Conducting a multiple sequence alignment using the ClustalW algorithm. (8 marks)

You will perform a multiple sequence alignment using the ClustalW algorithm in MEGA
software. Initially, you will download similar sequences to the amino acid sequence you
downloaded in 3(c) by performing a Protein BLAST at the NCBI.

Go to the NCBI BLAST using the following link or a google search. Then, select the
Protein BLAST search.
https://blast.ncbi.nlm.nih.gov/Blast.cgi

Click on choose file button under the “enter query sequence” section and upload the
amino acid sequence you downloaded in 3(c) in FASTA format. Alternatively, you can
also copy and paste the sequence into the sequence box.

Then, make sure the UniprotKB/Swiss-Prot is selected as the database and blastp is
selected as the algorithm. Then, expand the Algorithm parameters and limit the Max
target sequences to 10. Then, tick on “show results in a new window” and click on the
BLAST button.

This will result in the top 10 most similar protein sequences available in the UniProtKB
to the input sequence. Click on the download button on top of the hit list and select the
“FASTA (complete sequence)” option. This will download the 10 sequences in FASTA
format within the same file. Change the file extension of this file from “.txt” to “.fasta”.
Now, you can open this file in the MEGA software.

To perform the multiple sequence alignment, first, open the MEGA software. Then, click
on the Align button and select “Edit/Build Alignment”. Then select “Retrieve sequences
from a file” from the next window and click “Ok”. Then locate the FASTA file that you
downloaded before and open the file. Now, you will see all 10 sequences in the
sequence viewer.

Then, click on the Edit menu and select the “Select All” option. This will select all the
sequences. Then, click on the Alignment menu and click on the “Align by ClustalW”
option. In the resulting parameter box, change the Protein Weight Matrix into BLOSUM

6
MAM 5108: Microbial Bioinformatics
7 Dr. Pasan Fernando

(click on the “Weight” submenu). Then use the default remaining parameters and click
on the “Ok” button. This will run the multiple sequence alignment.

This will result in the aligned sequences in the sequence viewer. Deselect the initial
selection by clicking on one sequence. This will bring back the colors for you to observe
similar amino acid residues in aligned sequences. Fully conserved sites will be
represented by a star at the top of the site.

You can save the session by clicking on the Data menu and selecting the “save session”
option. Or you can click on “Export alignment” in the Data menu and export the
alignment in FASTA format.

Now, access the UniProt record (you already did in 3(b)) of the input sequence and
locate the corresponding Pfam record.

a. Write the names of the distinct domains found in the sequence with their
sequence coordinates. Also, take a screenshot of the domain organization
diagram and paste it below. (6 marks)

7
MAM 5108: Microbial Bioinformatics
8 Dr. Pasan Fernando

b. Now, you can observe the locations of the above domains in the sequence
alignment. First, click on the very first amino acid residue of the input sequence
(3 (b)) in the MEGA alignment viewer. At the bottom of the window, you will see
the position number. Make sure to select “W/O Gaps” option to avoid counting
gaps. Now, by clicking on different residues you can identify the domain regions
of the Pfam record. What can you conclude about the conservation of residues in
domain regions? (2 marks)

BI W2 Ex Ans
No ratings yet
BI W2 Ex Ans
9 pages
Module in Tics
No ratings yet
Module in Tics
20 pages
Bioinformatics Exercises Print
No ratings yet
Bioinformatics Exercises Print
6 pages
Is To Be Acquaint With Sequence Analysis Tools That Can Be Accessed Through The Internet Specifically Working The NCBI Database
No ratings yet
Is To Be Acquaint With Sequence Analysis Tools That Can Be Accessed Through The Internet Specifically Working The NCBI Database
3 pages
Structure and Function of Sars-Cov-2 Spike Protein: A Multiple Sequence Alignment (Msa) Study
No ratings yet
Structure and Function of Sars-Cov-2 Spike Protein: A Multiple Sequence Alignment (Msa) Study
11 pages
Diploma - Practical
No ratings yet
Diploma - Practical
11 pages
Chua Yuen Chong, Gerrard - BIO61604 - Pract 3 and 4
No ratings yet
Chua Yuen Chong, Gerrard - BIO61604 - Pract 3 and 4
20 pages
Module 3 Session.3 Updated Practical Assignment 2022 Lucy Nakabazzi
100% (3)
Module 3 Session.3 Updated Practical Assignment 2022 Lucy Nakabazzi
5 pages
Bioinfo 23e
No ratings yet
Bioinfo 23e
2 pages
PARE Introduction To Bioinformatics.v3
No ratings yet
PARE Introduction To Bioinformatics.v3
6 pages
6129 Proteomics Praticals
No ratings yet
6129 Proteomics Praticals
16 pages
Bioinformatics Ii - Lab No.1
No ratings yet
Bioinformatics Ii - Lab No.1
4 pages
IBO 2020 - Practical 2 Exam (Bioinformatics)
No ratings yet
IBO 2020 - Practical 2 Exam (Bioinformatics)
21 pages
Bioinformatics Tutorial 2019
No ratings yet
Bioinformatics Tutorial 2019
54 pages
Bioinformatics Lab 1
0% (1)
Bioinformatics Lab 1
4 pages
Bioinformatics (Database Uses)
No ratings yet
Bioinformatics (Database Uses)
18 pages
Bioinformatics Module.docx
No ratings yet
Bioinformatics Module.docx
8 pages
Semwork 1
No ratings yet
Semwork 1
19 pages
Bio Info Practicles
No ratings yet
Bio Info Practicles
12 pages
Biotechnology Assignment Fall 24 1
No ratings yet
Biotechnology Assignment Fall 24 1
2 pages
IBO 2020 - Practical 2 Exam (Bioinformatics) OpzU0IHHYACQWYtB4b2hmpMZbNh6b7df8
No ratings yet
IBO 2020 - Practical 2 Exam (Bioinformatics) OpzU0IHHYACQWYtB4b2hmpMZbNh6b7df8
20 pages
Bioinformatics Assingment - B8.Docx Alex Presly-37
No ratings yet
Bioinformatics Assingment - B8.Docx Alex Presly-37
10 pages
Manual Ins Iliico Procedures
No ratings yet
Manual Ins Iliico Procedures
57 pages
Bioinf Workshop2a
No ratings yet
Bioinf Workshop2a
20 pages
DNA Project 2014
No ratings yet
DNA Project 2014
39 pages
Rast Tutorial
No ratings yet
Rast Tutorial
10 pages
Objective
No ratings yet
Objective
6 pages
Gene Prediction Exercise
No ratings yet
Gene Prediction Exercise
10 pages
Abacus Manual
No ratings yet
Abacus Manual
11 pages
MEGAN Handbook
No ratings yet
MEGAN Handbook
59 pages
Experiment 9 Bioinformatics Tools For Cell and Molecular Biology
No ratings yet
Experiment 9 Bioinformatics Tools For Cell and Molecular Biology
11 pages
BI Lab Manual(18-19)
No ratings yet
BI Lab Manual(18-19)
21 pages
An Introduction To NCBI BLAST: Prerequisites Resources
No ratings yet
An Introduction To NCBI BLAST: Prerequisites Resources
23 pages
InterPro Final Print
No ratings yet
InterPro Final Print
9 pages
DAVID Tutorial
No ratings yet
DAVID Tutorial
34 pages
Exercise 7 Bioinformatics
No ratings yet
Exercise 7 Bioinformatics
8 pages
BLAST - Practic Information
No ratings yet
BLAST - Practic Information
2 pages
Gene320_Prac 2 Assignment_2025
No ratings yet
Gene320_Prac 2 Assignment_2025
2 pages
BioInformatics Lab2 Karsten
No ratings yet
BioInformatics Lab2 Karsten
5 pages
Mascot: Take The Guesswork Out of Protein Identification..
No ratings yet
Mascot: Take The Guesswork Out of Protein Identification..
12 pages
Bioinformatics:: Guide To Bio-Computing and The Internet
No ratings yet
Bioinformatics:: Guide To Bio-Computing and The Internet
34 pages
Supplementary List of Software For Bioinformatics and Comparative Genomics
No ratings yet
Supplementary List of Software For Bioinformatics and Comparative Genomics
5 pages
Experiment 11 - DNA: Biomedical Computation Lab (2) :: Ngee Ann Polytechnic Medical Imaging, Informatics and Telemedicine
No ratings yet
Experiment 11 - DNA: Biomedical Computation Lab (2) :: Ngee Ann Polytechnic Medical Imaging, Informatics and Telemedicine
5 pages
A Simple Demo of Memetic Algorithm Feature Selection: Step 1: Run MAFS and Open The GUI
No ratings yet
A Simple Demo of Memetic Algorithm Feature Selection: Step 1: Run MAFS and Open The GUI
7 pages
Bioinformatics Assingment - New Kandy - Draft
100% (1)
Bioinformatics Assingment - New Kandy - Draft
14 pages
BLAST-EXPLORER Helps You Building Datasets For Phylogenetic Analysis
No ratings yet
BLAST-EXPLORER Helps You Building Datasets For Phylogenetic Analysis
6 pages
Lab Report 03
No ratings yet
Lab Report 03
18 pages
MOLECULAR BIOLOGY PRACTICUM REPORT
No ratings yet
MOLECULAR BIOLOGY PRACTICUM REPORT
14 pages
proj 6 (1)
No ratings yet
proj 6 (1)
4 pages
Lab Work
No ratings yet
Lab Work
29 pages
Metabolic Reconstruction by RAST Server
No ratings yet
Metabolic Reconstruction by RAST Server
8 pages
Mascot: Take The Guesswork Out of Protein Identification..
No ratings yet
Mascot: Take The Guesswork Out of Protein Identification..
12 pages
L10 NCBI Exercises
No ratings yet
L10 NCBI Exercises
44 pages
Biology 171L - General Biology Lab I Lab 12: Introduction To Bioinformatics
No ratings yet
Biology 171L - General Biology Lab I Lab 12: Introduction To Bioinformatics
6 pages
Tutorial: Expression Analysis Using RNA-Seq
No ratings yet
Tutorial: Expression Analysis Using RNA-Seq
19 pages
Assignent-01/Abhishek Mishra/HBTI Kanpur Bioinformatics-Programs & Tools
No ratings yet
Assignent-01/Abhishek Mishra/HBTI Kanpur Bioinformatics-Programs & Tools
9 pages
Introduction to Bioinformatics Using Action Labs
From Everand
Introduction to Bioinformatics Using Action Labs
Jean-Louis Lassez
5/5 (1)
Gene Expression Programming: Fundamentals and Applications
From Everand
Gene Expression Programming: Fundamentals and Applications
Fouad Sabry
No ratings yet
Mastering Parallel Programming with R
From Everand
Mastering Parallel Programming with R
Simon R. Chapple
No ratings yet
Build Supercomputers with Raspberry Pi 3
From Everand
Build Supercomputers with Raspberry Pi 3
Carlos R. Morrison
No ratings yet
3 meiosis
No ratings yet
3 meiosis
3 pages
Understanding Genetic Variation and Random Fertilisation
No ratings yet
Understanding Genetic Variation and Random Fertilisation
17 pages
practical on Protein bioinformatics practical
No ratings yet
practical on Protein bioinformatics practical
11 pages
5 The Female Reproductive System
No ratings yet
5 The Female Reproductive System
5 pages
Footprint of the Tumor Genetics
No ratings yet
Footprint of the Tumor Genetics
3 pages
Lecture 5 Protein Sequence Database
No ratings yet
Lecture 5 Protein Sequence Database
12 pages
Bioinfo Notes PDF
No ratings yet
Bioinfo Notes PDF
21 pages
UniProt SwissProt
No ratings yet
UniProt SwissProt
4 pages
Overall Report of The Internship
No ratings yet
Overall Report of The Internship
11 pages
Laboratory Manual: Biology 3055
No ratings yet
Laboratory Manual: Biology 3055
37 pages
Fair Principles
No ratings yet
Fair Principles
9 pages
Unipro UGENE User Manual
No ratings yet
Unipro UGENE User Manual
247 pages
Exercice 1 Peptide Mass Fingerprinting Exercise (Mass List)
No ratings yet
Exercice 1 Peptide Mass Fingerprinting Exercise (Mass List)
3 pages
Biological Data and Database Biological Data
No ratings yet
Biological Data and Database Biological Data
10 pages
Bioinformatics
No ratings yet
Bioinformatics
10 pages
ZMap User Manual
No ratings yet
ZMap User Manual
26 pages
Guide Sheet For Tics Lab 1 - 4
No ratings yet
Guide Sheet For Tics Lab 1 - 4
17 pages
Molecules 27 04643
No ratings yet
Molecules 27 04643
15 pages
Computational Biology Lab File
No ratings yet
Computational Biology Lab File
67 pages
BIO316 (1)
No ratings yet
BIO316 (1)
102 pages
Biological Databases
No ratings yet
Biological Databases
28 pages
Bioinformatics Manual
No ratings yet
Bioinformatics Manual
117 pages
Bio (-) Informatics: Dr. Sudhir Kumar
100% (4)
Bio (-) Informatics: Dr. Sudhir Kumar
33 pages
Bif401 Manual 2023
No ratings yet
Bif401 Manual 2023
27 pages
Stem Cell Renewal and Cell-Cell Communication Methods and Protocols (Methods in Molecular Biology, 2346) (Kursad Turksen (Editor) ) (Z-Library) PDF
No ratings yet
Stem Cell Renewal and Cell-Cell Communication Methods and Protocols (Methods in Molecular Biology, 2346) (Kursad Turksen (Editor) ) (Z-Library) PDF
253 pages
(FREE PDF Sample) Bioinformatics Database Systems 1st Edition Kevin Byron Ebooks
100% (10)
(FREE PDF Sample) Bioinformatics Database Systems 1st Edition Kevin Byron Ebooks
51 pages
What Is Bioinformatics
100% (1)
What Is Bioinformatics
22 pages
Concepts of Bioinformatics PDF
100% (2)
Concepts of Bioinformatics PDF
20 pages
Exp 1
No ratings yet
Exp 1
7 pages
Jurnal Aurelia Azalya Sofyan-7a - Compressed
No ratings yet
Jurnal Aurelia Azalya Sofyan-7a - Compressed
21 pages
Inter Pro
No ratings yet
Inter Pro
7 pages
Stsgroup2 Information Age
100% (1)
Stsgroup2 Information Age
17 pages
Protein Sequence Design and Analysis: BLAST Based Computational Approaches-A Review
No ratings yet
Protein Sequence Design and Analysis: BLAST Based Computational Approaches-A Review
16 pages
UniproUGENE UserManual
No ratings yet
UniproUGENE UserManual
207 pages
European Molecular Biology Laboratory (EMBL) : Hafiz.M.Zeeshan - Raza Research Associate - HEC - NRPU
No ratings yet
European Molecular Biology Laboratory (EMBL) : Hafiz.M.Zeeshan - Raza Research Associate - HEC - NRPU
22 pages

Practical 2 sequence alignment

Uploaded by

Practical 2 sequence alignment

Uploaded by

MAM 5108: Microbial Bioinformatics

1 Dr. Pasan Fernando

Practical 2: Sequence analysis and alignment

Go to the National Center for Biotechnology Information (NCBI) BLAST tool by

Answer the following questions based on the BLAST results.

4. Performing a pairwise global alignment using the Needleman-Wunsch alignment

Answer the following questions based on the alignment result.

5. Conducting a multiple sequence alignment using the ClustalW algorithm. (8 marks)

You might also like