Diploma - Practical
Diploma - Practical
To perform global sequence alignment between two nucleotide or amino acid sequences and find out structural
or functional similarity
Procedure:
The two sequences can be aligned globally using different algorithms. Needleman-Wunsch algorthim is one of the
best algorithm for global alignment, which can be performed using the online tool EMBOSS Needle (European
Molecular Biology Open Software Suite).
1. To download the data and get access to the tools, go to simulator tab
2. Get access to the tool EMBOSS Needle
3. Copy and paste the FASTA formatted (Computational representation of the DNA sequence) nucleotide
sequence in the step 2 dialog box.
4. One can also choose the file through “Choose File” option and can upload the sequence file.
5. Similarly copy and paste or upload the second sequence for the alignment.
6. EMBOSS needle is predefined with the scoring matrices DNAfull for nucleotide sequence,
BLOSUM65 for protein sequence .
7. The gap open and gap extend penalty can be changed by user defined values. In this example it kept as
default values.
8. The user can be notified with the results through email, if the checkbox is checked and the mail address
is submitted.
9. After the job is submitted by clicking on the submit button, the results are displayed within few
minutes. The Results page comprises of three tabs namely Alignment, Submission details and submit
another job.
10. The submission details tab displays user to show the program used, time and date of when the program
has launched and the internal commands used for the program execution. Also the user can download
the input and output files from this tab
Results
The Alignment tab shows the alignment of the two sequences, with all the described parameters, used
scoring matrices and Gap penalty scored values.
The Alignment tab has an option for the user to download the entire alignment file by clicking on the
button “View Alignment file”
The alignment between the two sequences.The gaps are represented with ‘-‘. If a match is there between
the two nucleotide there is a symbol ‘|’ and the mismatch is represented with a dot ‘.’
2. Smith-Waterman Algorithm - Local Alignment of Sequences
Aim :
To perform local sequence alignment between two nucleotide or amino acid sequences and find out structural or
functional similarity.
Procedure
The two sequences can be aligned pairwise using different algorithms , Smith-Waterman algorthim is one of
the best algorithm , which can be performed using the online tool EMBOSS water.
1. To download the data , and get access through the tools , go to simulator tab
2. Get access to the tool EMBOSS Water
3. Copy and paste the FASTA formatted (Computational representation of the DNA sequence)
nucleotide sequence in the step1 dialog box.
4. One can also choose the file through “Choose File” option and can upload the sequence file.
5. Similarly copy and paste or upload the second sequence for the alignment.
6. EMBOSS needle is predefined with the scoring matrices DNAfull for nucleotide sequence,
BLOSUM65 for protein sequence.
7. The gap open and gap extend penalty can be changed by user defined values. In this example it kept
as default values.
8. The user can be notified results through email, if the checkbox has been checked and the mail address
is submitted.
9. After the job is submitted by clicking on the submit button, the results are displayed within few
minutes. The Results page comprises of three tabs namely Alignment, Submission details and submit
another job.
10. The Alignment tab shows the alignment of the two sequences, with all the described parameters, used
scoring matrices and Gap penalty scored values.
Results
The Alignment tab has an option for the user to download the entire alignment file by clicking on the button
“View Alignment File”
The alignment between the two sequences the gaps are represented with ‘-‘. If a match is there between the
two nucleotide there is a symbol ‘|’ and the mismatch is represented with a dot ‘.’
The submission details tab displays user specified details like the program used, time and date of when the
program has launched and the internal commands used for the program execution. Also the user can
download the input and output files from this tab
3. Pair wise Sequence Alignment using BLAST
Aim :
Procedure
User have to specify the type of BLAST programs from the database like BLASTp, BLASTn, BLASTx,
tBLASTn, tBLASTx.
Enter a query sequence by pasting the sequence in the query box or uploading a FASTA file which is
having the sequence for similarity search. This step is similar for all BLAST programs. The user can
give the accession number or gi number or even a raw FASTA sequence. Go to simulator tab to know
more about how to retrieve query sequence.
User first has to know what all databases are available and what type of sequences are present in those
databases. Sequence similarity search involves searching of similar sequences of the query sequence
from the selected databases.
Step 4: Select the algorithm and the parameters of the algorithm for the search
There are different algorithms for some of the BLAST program. User has to specify the algorithm for
the BLAST program. Nucleotide BLAST uses algorithms like MegaBLAST which searches for highly
similar sequences, discontiguous MegaBLAST which searches for more dissimilar sequences and
BLASTn which searches for somewhat similar sequences. Meanwhile for protein BLAST algorithms
like BLASTp, searches for similarity between protein query and protein database, PSI-BLAST performs
position specific search iteratively, PHI-BLAST searches for a particular pattern (user has to enter the
pattern to search in the PHI pattern box provided) that is present in the sequence against the sequences in
the database, DELTA-BLAST is Domain Enhanced Lookup Time Accelerated BLAST. It searches
multiple sequence and aligns them to find protein homology. The different algorithmic parameters are,
Target sequences, Short queries, E-value, Word size, Query range, scoring parameters (Match/Mismatch
scores, and Gap penalties) and filters (Filter and Mask) which are required to run BLAST programs.
Default values are provided but the user can adjust the values accordingly .
Submission of the BLAST program can be done by clicking the BLAST button at the end of the page.
Screen shot of result.
Result:
After submitting the query sequence for sequence similarity search, the result page will appear along
with the information like Query id, Description, Molecule type, Length of sequence, Database name and
BLAST program. It shows the putative conserved domains that have been detected while undergoing
sequence similarity search.
The query sequence represented as a numbered red bar below the color key. Database hits are shown
below the query (red) bar according to the alignment score. Among the aligned sequences, the most
related sequences are kept near to the query sequence. User can find more description about these
alignments, by dragging the mouse to the each colored bar
The alignment is preceded by the sequence identities, along with the definition line, length of the
matched sequence, followed by the score and E-value. The line also contains the information about the
identical residues in alignment (identities), number of positivity’s, number of gaps used in the
alignment. Finally it shows the actual alignment, along with the query sequence on the top and database
sequence below the query. The number on either sides of the alignment indicates the position of amino
acids/nucleotides in sequence
Procedure
Select the database to search :Databases are required to run the sequence similarity search. Multiple
databases can be used at the same time. The different databases are
The query sequence can be entered directly in GCG, FASTA, EMBL, GenBank, PIR, NBRF,
PHYLIP or UniProtKB/Swiss-Prot formats.
A file containing the valid sequence in any format mentioned above can be used as a query for
sequence similarity search. Sequence type indicates the type of sequence (PROTEIN / DNA / RNA)
for similarity search.Go to simulator tab to know more about how to retrieve the query sequence.
User has to specify the type of program and the matrix for scoring. FASTA, FASTX, FASTY,
SSEARCH, GGSEARCH and GLSEARCH are the different programs used. Substitution matrix are
used for scoring alignments. The matrices are BLOSUM50, BLOSUM62, BLASTP62, PAM120,
PAM250, MDM10, MDM20, and MDM40. BLOSUM50 is set as a default substitution matrix.
Parameters include.
GAP open and GAP extended penalty: Common and regular cause for GAP is mutation, if gap
penalty is low we can get high scoring sequence similarity search. Also gaps will increase uncertainty
in alignment.
Expectation value (E-value): It decreases exponentially with the score that is assigned to an
alignment between two sequences.
Strands, Histograms, Filter: It filters the low complex regions in sequence similarity search.
Histogram will give graphical representation of scores.
Statistical estimates, Scores, alignments, sequence range and database range: specify the range
of the query for search in database.
HSPs, Score format, Transition table score format: are the different score formats. Transition
table gives the genetic codes used in translation.
Step 4: Submission
The result page can be seen in another window by clicking submit. This is an interactive process,
when the process is complete the result will be displayed in the browser. Result can be sent to a valid
email address which has to be specified in the text box.
Result
Result page appears by giving the information like aligned sequences from the sequence similarity
search, database id, source of the sequence, Gene-expression, molecule type, Nucleotide sequence,
Genomics, Protein sequences, Ontologies, Enzymes, protein families, and Literature, which is
followed by the length of sequence, score, identities, positives and E-value.
Tool output gives complete statistical details of the sequence similarity search.
FASTA visual output gives the result of the sequence match and subject match with their E-values in a
colour full schema.
5. Aligning Multiple Sequences with CLUSTAL W
Aim :
To align three or more sequences to find out structural and functional relationship between these
sequences.
1. To download the data , and to get acces to the tools, go to simulator tab.
2. Get access to the CLUSTALW tool
3. In the dialog box given, paste your set of sequences, the sequences should be pasted with the
‘>’ symbol followed by name of the sequence (as similar as FASTA format) followed by return
(enter key) and then the sequence
4. The sequences can also be submitted through file by clicking on the option “choose file” such
that all the sequences should be in similar format.
5. The other two steps the user can select on his/her own to set the parameters for pair wise
alignment options and multiple sequence alignment options, to select the scoring matrices and
scoring values. In most of the cases the parameters are set default .
6. Results can be notified by email when the user checks the button email notification
7. After the submission of the job the results can be downloaded into a file by clicking on the
option Download alignment file
8. The result summary tab gives the links to different outputs summary and link to each output
Result
The result files are with different formats of input and output files of the alignment.
The user can enable the java plug-in in the browser, if it is disabled and thus the user can use
Jalview to see the alignment with the colours.
The user can view the output file and can save by clicking on the button “View output file”.
The output file represents the length of each sequence , and the score of each alignment
individually.
6 . Construction of Cladogram
Aim :
To find the evolutionary relationship between different organisms and analyze the changes that occured in
organisms during the course of evolution using PHYLIP.
1. Align the multiple DNA sequences (output of the ClustalW) and save it in PHYLIP format as
infile.phy. Start the program of Dnadist by clicking the icon and giving this infile as input.
2. All the PHYLIP programs are menu driven programs. Dnadist will calculate pairwise distances
between the sequences. At first, Dnadist will ask whether the input file is there in the PHYLIP folder.
If the file does not exist, it will ask you to give the correct file name. After giving the correct input, if
needed it will ask to change any settings for the program by typing the first letter or number. If the
changes are not required, by typing ‘Y’ it will start running the program. Output will return to the file
as outfile, so that the output of this file can be used as input of another program.
3. Like Dnadist, Neighbor also gives sequence distance analysis. Output of Dnadist is given as input to
Neighbor. Output file and tree file will be returned to outfile and outtree
4. Branch lengths and tree are represented with the help of Neighbor joining method. The outfile and
outree after the Neighbor joining method
RESULT:
Cladogram is represented via Consensus tree program. Input for the cladogram will be output
(outtree) of Neighbor program which will generate outfile and outtree.
It represents the consensus tree. Numbers on the branches indicate the number of times the species
has been partitioned into two sets separated by that branch occurred among the trees.
7 . Phylogenetic Analysis using PHYLIP - Rooted trees
Aim :
To find the evolutionary relationship between different organisms based on the time scale and to analyze
the changes that occured in an organisms using PHYLIP.
Procedure
Align the multiple DNA sequences (output of the ClustalW) and save it in PHYLIP format as infile.phy.
Start the program of Dnadist by clicking the icon and giving this infile as input.
All the PHYLIP programs are menu driven programs. Dnadist will calculate pairwise distances between the
sequences. At first, Dnadist will ask whether the input file is there in the PHYLIP folder.
If the file does not exist, it will ask you to give the correct file name. After giving the correct input, if needed
it will ask to change any settings for the program by typing the first letter or number.
If the changes are not required, by typing ‘Y’ it will start running the program. Output will return to the file
as outfile, so that the output of this file can be used as input of another program
Like Dnadist, Neighbor also gives sequence distance analysis. Output of Dnadist is given as input to
Neighbor. Output file and tree file will be returned to outfile and outtree
Branch lengths and tree are represented with the help of Neighbor joining method
Result
Rooted trees are represented via Drawgram by providing the input as the "outfile" obtaining from neighbor
joining method. Rooted tree considers an imaginary root as the start and from that the other sequences are
aligned.
8 . Phylogenetic Analysis using PHYLIP - Unrooted trees
Aim :
To find the evolutionary relationships between organisms and to analyze the changes occuring in these
organisms during evolution using PHYLIP.
Procedure
Align the multiple DNA sequences (output of the ClustalW) and save it in PHYLIP format as infile.phy.
Start the program of Dnadist by clicking the icon and giving this infile as input.
All the PHYLIP programs are menu driven programs. Dnadist will calculate pairwise distances between the
sequences. At first, Dnadist will ask whether the input file is there in the PHYLIP folder.
If the file does not exist, it will ask you to give the correct file name. After giving the correct input, if needed
it will ask to change any settings for the program by typing the first letter or number.
If the changes are not required, by typing ‘Y’ it will start running the program. Output will return to the file
as outfile, so that the output of this file can be used as input of another program
Like Dnadist, Neighbor also gives sequence distance analysis. Output of Dnadist is given as input to
Neighbor.
Branch lengths and tree are represented with the help of Neighbor joining method. The outfile and outree
after the Neighbor joining method
Result :
Unrooted trees are represented via Drawtree by giving outtree from the previous program as the input .
9 Genome Annotation and Multiple Sequence Allignment
Aim :
Procedure :
1. From the tool bar select “comparative genomics”. All throughout the ASAP, one would be
working with a specific genome, or a set of experiments on that genome.
2. To begin, select a genome from the drop down menu. Click OK. It redirects to a page where one
has to select a sequence version or an experiment set. Different versions of query related to the
genome are critical. The home page is shown below.
3. Select the corresponding option from the different sequence options given below. Select the
features from “features type” in the annotation page, indicating the genome of the related
feature, the gene name, type of relationship, and the curated approval status
4. By clicking on the show orthologs it will redirects to the gene expression result data based on
the feature ID, feature type, product of the genes. Feature ID provides the whole information of
the gene being added in ASAP until the current time.
5. Download DB-Friendly Format will allow one to retreive the data as such in text file. By
clicking on the “next” and “previous” button will redirects to their consequetive pages of the
feature list.The screen shot is shown below.
6. Each record in the feature list has full information of the entire annotated gene. By clicking on
the list it will redirects to the information related page.
Select the type of query sequence, enter the query sequence of DNA/protein sequence in FASTA
format in the given query box or upload a file which contains the FASTA sequence and set up the
parameters like search type, Database sequence type, E-value, filter , description and alignment.
Choose an appropriate database according to the query provided. Select the genome and version from
the database which is related to the query sequence as shown below.
To run the BLAST search, one can give the sequence and set the parameters related to the query
sequence.
Result :
After giving the input and setting up the parameters click on search, it will redirects to the BLAST result
page. It shows the information like reference regarding ASAP BLAST program and reference for
composition based statistics.
It shows full information like number of words in query sequence, GI, Accession number, protein name and
the databases are selected by the user. List of the sequences, number of sequences in database, number of
letters in database, lambda value and gapped lamda value are the other information one can obtain. Matrix
gives the information about the gap penalities, sequences, hits, extensions and databases.