0% found this document useful (0 votes)
68 views35 pages

BI Manual

The document contains details about 7 bioinformatics practicals to be completed by Fazila Fatima, a 7th semester student in the Department of Zoology. The practicals include retrieving FASTA sequences from NCBI, determining protein parameters using ProtParam, finding similar sequences using BLAST, and performing multiple sequence alignment using ClustalW. Methods for each practical are described.

Uploaded by

Fazila
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
68 views35 pages

BI Manual

The document contains details about 7 bioinformatics practicals to be completed by Fazila Fatima, a 7th semester student in the Department of Zoology. The practicals include retrieving FASTA sequences from NCBI, determining protein parameters using ProtParam, finding similar sequences using BLAST, and performing multiple sequence alignment using ClustalW. Methods for each practical are described.

Uploaded by

Fazila
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 35

Name : Fazila Fatima

Department: Zoology
Roll. No : bsf2001858
Semester:7th

Bioinformatics
Practicals

Practical # 1 : Retrieval of FASTA sequence


Practical # 2 :Determination of proteins physical and chemical
parameters
Practical # 3: Finding similar sequence for proteins and DNA
Practical # 4: Multiple alignment
PRACTICAL#5: Predicting Proteins Secondary Structure
PRACTICAL#6: Predicting RNA Secondary Structure
PRACTICAL#8: Finding Protein Families

Practical # 1
Retrieval of FASTA sequence using NCBI
Introduction to NCBI
It is stands for National Center for
Biotechnology Information a division of the National Library of Medicine (NLM) at the U.S.
National Institutes of Health is a leader in the field of bioinformatics. It studies computational
approaches to fundamental questions in biology and provides online delivery of biomedical
information and bioinformatics tools. The National Center for Biotechnology Information (NCBI)
produces a variety of online information resources for biology including the GenBank nucleic
acid sequence database and the PubMed database of citations and abstracts published in life
science journals. NCBI provides search and retrieval operations for most of these data from 35
distinct databases.
Entrez Global Query is an integrated search and retrieval system that provides access to all
databases simultaneously with a single query string and user interface.

Retrieving FASTA sequence for nucleotide


Home Page
NCBI has a simplified homepage from where the user can navigate to different resources. The
left side pane of the homepage has a site map followed by different categories which narrows
down the possibility of finding the right sequence. On the right side you can see the list of
popular resources which is very useful for first time users

METHODS
 First of all Open the NCBI website https://www.ncbi.nlm.nih.gov .
 Then Select the database and choose the option Nucleotide.

Open data bases


 Enter the gene name as EGFR in search tab and click on search. Select species from the
right-side.
 Click on the Homo sapiens from the taxon bar and the result page of gene is Opened.

Retrieve FASTA
FASTA Sequence:
A gene is the molecular unit of heredity of a living organism. It is the name given to certain
stretches of DNA and RNA that code for a type of protein or a strand of RNA that has some
function in an organism. Knowledge of gene sequences has become indispensable for basic
biological research, other research branches using sequencing, and in many applied fields such
as diagnostics, biotechnology, forensic biology, and biological systematics. In bioinformatics, the
FASTA format is a textual format for representing either nucleotide sequences or peptide
sequences in which nucleotides or amino acids are represented by single-letter codes. The
format also allows for sequence names and comments before sequences. The format originates
from the FASTA software package but has now become a standard in bioinformatics.
 Obtain relevant information about gene and retrieve FASTA format of its sequence by
clicking on the FASTA tab at the left corner.
FASTA Sequence of EGFR gene
CTGGTTGTGCATTTGCTGTGGGTTCCCTCCGGCAGGCGACCTCTCCGCGCTGAGAAGGTTATCCGGATAAC
CAAGTAATTATGTGGTGACAGATCACGGCTCGTGCGTCCGAGCCTGTGGGGCCGACAGCTATGAGATGGA
GGAAGACGGCGTCCGCAAGTGTAAGAAGTGCGAAGGGCCTTGCCGCAAAGTGTGTAACGGAATAGGTAT
TGGTGAATTTAAAGACTCACTCTCCATAAATGCTACGAATATTAAACACTTCAAAAACTGCACCTCCATCAGT
GGCGATC
Practical # 2
Determination of proteins physical and chemical
parameters
Physical Parameters:
A characteristic of matter that may be observed and measured without changing the chemical
Identity of a sample. The measurement of a physical property may change the arrangement of
matter in A sample but not the structure of its molecules.

Chemical Parameters:
Chemical parameters include pH acidity, alkalinity, chlorine, hardness, dissolved oxygen And
biological oxygen demand. Biological parameters include nutrients, bacteria, algae and Viruses.
Water quality parameters are important because different application scenarios will Generally
have different requirements.

Tool used for chemical and physical properties of


protein is Expasy ProtParam
ProtParam
ProtParam is a tool that allows the calculation of various physical and chemical parameters for a
given protein in Swiss-Prot or TrEMBL or for a user-specified protein sequence. Calculated
parameters include molecular weight, theoretical pI, amino acid composition, extinction
coefficient, estimated half-life, instability index, aliphatic index, and grand diameter hydropathy.

METHOD
 Open the Expasy ProtParam website https://web.expasy.org/protparam/ on the
Google
 Then Paste amino acid sequence of EGFR protein( retrieved from NCBI )on box and click
on compute parameters
The result page is appeared which shows different physical and chemical parameters of EGFR
protein such as

 Number of amino acids


 Molecular weight
 Theoretical pI: It means that the protein has no net charge because the positive and
negative charges are equal.
 Amino acid composition
 Total number of negatively and positively charged residues
 Atomic composition
 Formula and total number of atoms
 Extinction coefficient: It is a characteristic that determines how strongly a species
absorbs or reflects radiation or light at a particular wavelength. It is measured in M-1cm-
1
 Instability index: The Instability index is a measure of proteins, used to determine
whether it will be stable in a test tube. If the index is less than 40, then it is probably
stable in the test tube. If it is greater than it is probably not stable.
 Aliphatic index: The relative volume occupied by aliphatic side chains.
 GRAVY: Grand average of hydropathicicty: It is defined as the average hydropathy value
of peptide or protein.

Practical # 3
Finding similar sequence for proteins and DNA
BLAST
BLAST stands for Basic Local Alignment Search Tool. BLAST finds regions of similarity between
biological sequences. The program compares nucleotide or protein sequences to sequence
databases and calculates the statistical significance. BLAST can be used to infer functional and
evolutionary relationships between sequences as well as help identify members of gene
families.

Method:
• Open the BLAST website https://blast.ncbi.nlm.nih.gov/Blast.cgi in web browser
 Select the Nucleotide BLAST and the blastn suit page is Opened.

 Paste the FASTA sequence of EGFR in the tab and also add job title in the respective bar
then click on blast option .
results of blast are as follows
1. Description
2. Graphic summary
3. Alignment
4. Taxonomy

 Description: In this result, a list of similar sequences are arranged in ascending order.
Query coverage: Query cover is the percentage of the query sequence that overlaps the
reference sequence
Percentage identity: Percent identity is the % of bases that are identical to the reference
genome.
E value: "E-value" (Expect Value) is a statistical measure that represents the expected
number of random alignments that would have a score equal to or better than the one
obtained in the search, purely by chance.
 Graphic Design: A graphical summary in the context of a BLAST report typically refers to
a visual representation of the sequence comparison between the query sequence and
database sequences found to be similar.
 Alignment: In BLAST, alignment refers to the process of finding and displaying regions of
similarity between two or more sequences. The primary purpose of alignment in BLAST
is to identify regions where the query sequence and a sequence from a database (or
multiple sequences) share similarity, which can provide insights into potential homology,
functional conservation, or evolutionary relationships.

 Taxonomy: In BLAST, taxonomy plays an important role in helping to identify the


evolutionary relationships and origin of sequences that match the query sequence in the
search. Taxonomy in BLAST is used to classify and organize sequences in a database
based on their evolutionary history and relationships.
Practical # 4
Multiple Alignment

Tool : Clustal W
ClustalW is a widely used system for aligning any number of homologous nucleotide or protein
sequences. For multi-sequence alignments, ClustalW uses progressive alignment methods. In
these the most similar sequences that is those with the best alignment score are aligned first.
Then progressively more distant groups of sequences are aligned until a global alignment is
obtained.

Method of multiple Alignment


 Open the CLUSTAL W website https://www.genome.jp/tools-bin/clustalw in the
Google.
 Make the file of different FASTA sequences in word to get their multiple alignment. Each
sequence should start with a '>' character followed by a sequence identifier and then
the sequence itself.

 Then enter the multiple sequences in the bar and then click the Execute Multiple
Alignment. Output page is open
Result Interpretation
 Conserved Regions: Positions where most or all sequences
have the same nucleotide (A, T, C, G). These positions
indicate conserved regions. Conserved regions are often
biologically significant, as they may represent functional
domains or important structural elements in the
sequences.
 Gaps in the alignment are represented by "-" characters.
They indicate insertions or deletions (indels) in the
sequences. The length and position of gaps can vary. Long
gaps may suggest significant sequence differences or
structural variations.
 Consensus Sequence: Some alignment files may include a
consensus sequence, which represents the most common
nucleotide at each position in the alignment. It is typically
denoted by symbols such as "*" or ":" to indicate different
levels of conservation.
 Phylogenetic analysis: The Multiple sequence alignment
can be used as input for phylogenetic analysis to infer
evolutionary relationships between sequences. Tools like
phylogenetic trees can be generated to visualize these
relationships based on the alignment

PRACTICAL#5
Predicting Proteins Secondary Structure
Tool : Psipred
PSIPRED works to normalize the sequence profile generated by PSIBLAST. Then, by using neural
networking, initial secondary structure is predicted. For each amino acid in the sequence, the
neural network is fed with a window of 15 acids.

Method :
 Open the Psipred website http://bioinf.cs.ucl.ac.uk/psipred/ on the Google.
 Then enter the FASTA sequence of Gene EGFR in the given bar and also add job name.
 Then press the submit bar
 Results are shown on the screen.
PRACTICAL#6:
Predicting RNA Secondary Structure
Tool used :RNA Fold
RNAfold predicts the consensus structure of a set of aligned DNA or RNA sequences. It extends
standard dynamic programming algorithms for RNA secondary structure prediction by averaging
the energy contributions over all sequences and incorporating covariation terms into the energy
model to reward compensatory mutations and penalize non-compatible base-pairs. Again, it
supports prediction of the minimum free energy structure and base-pairing probabilities and
can handle circular sequences. The input is a single multiple sequence alignment in CLUSTAL W
or FASTA format. There are only two additional parameters compared to the RNAfold server,
namely ‘Weight of covariance term’ and ‘Penalty for non-compatible sequences’ which affect
the covariance scoring schema and the penalization of non-compatible base-pairs of the
RNAalifold algorithm. The output is similar to that of the RNAfold server, but also features a
structure annotated alignment. Plots are augmented by a special coloring schema that indicates
compensatory mutations. Note that the more mutations are observed that support a certain
base-pair, the more evidence is given that this base-pair might be correctly predicted.

Method:
 Open the website RNAfold
http://rna.tbi.univie.ac.at/cgi-bin/RNAWebSuite/RNAfold.cgi on the Google’s search bar

 Then enter the RNA sequence of EGFR


 mRNA sequence of EGFR protein is obtained from NCBI website.

 Then click on the proceed bar to get results


PRACTICAL#8:
Finding Protein Families
Tool used: Pfam
Pfam is a database of protein families that includes their annotations and multiple sequence
alignments generated using hidden Markov models. The most recent version, Pram 35.0, was
released in November 2021 and contains 19,632 families.

Features:
For each family in Pfam one can:
● View a description of the family
● Look at multiple alignments
● View protein domain architectures
● Examine species distribution
● Follow links to other databases
● View known protein structures

Method
 Open Pfam website from the Google by using http://pfam-legacy.xfam.org/.
 Then add assession no of protein and click on go tab

Practical # 11
Primer design

Objectives of primer designing


The objective of primer design is straightforward: to determine a set of
forward the reverse primers that will amplify one group of sequences
(the target) but no others (the non-targets)
Tool used
Primer3Plus
Here we present Primer3Plus, a new web interface to the popular
Primer3 primer design program as an enhanced alternative for the CGI-
scripts that come with Primer3. Primer3 consists of a command line
program and a web interface. The web interface is one large form
showing all of the possible options. This makes the interface powerful,
but at the same time confusing for occasional users. Primer3Plus
provides an intuitive user interface using present-day web technologies
and has been developed in close collaboration with molecular biologists
and technicians regularly designing primers.
Method :
 Open Primer3Plus website on Google.
 Then to add FASTA sequence on tab bar we will retrieved FASTA
sequence by using NCBI website.
 Now enter the retrieved sequence in the source sequence tab and
pick Primer.
OligoCalc:
 Open OligoCalc

You might also like