0% found this document useful (0 votes)

68 views35 pages

BI Manual

The document contains details about 7 bioinformatics practicals to be completed by Fazila Fatima, a 7th semester student in the Department of Zoology. The practicals include retrieving FASTA sequences from NCBI, determining protein parameters using ProtParam, finding similar sequences using BLAST, and performing multiple sequence alignment using ClustalW. Methods for each practical are described.

Uploaded by

Fazila

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

68 views35 pages

BI Manual

Uploaded by

Fazila

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 35

Name : Fazila Fatima

Department: Zoology
Roll. No : bsf2001858
Semester:7th

Bioinformatics
Practicals

Practical # 1 : Retrieval of FASTA sequence

Practical # 2 :Determination of proteins physical and chemical
parameters
Practical # 3: Finding similar sequence for proteins and DNA
Practical # 4: Multiple alignment
PRACTICAL#5: Predicting Proteins Secondary Structure
PRACTICAL#6: Predicting RNA Secondary Structure
PRACTICAL#8: Finding Protein Families

Practical # 1
Retrieval of FASTA sequence using NCBI
Introduction to NCBI
It is stands for National Center for
Biotechnology Information a division of the National Library of Medicine (NLM) at the U.S.
National Institutes of Health is a leader in the field of bioinformatics. It studies computational
approaches to fundamental questions in biology and provides online delivery of biomedical
information and bioinformatics tools. The National Center for Biotechnology Information (NCBI)
produces a variety of online information resources for biology including the GenBank nucleic
acid sequence database and the PubMed database of citations and abstracts published in life
science journals. NCBI provides search and retrieval operations for most of these data from 35
distinct databases.
Entrez Global Query is an integrated search and retrieval system that provides access to all
databases simultaneously with a single query string and user interface.

Retrieving FASTA sequence for nucleotide

Home Page
NCBI has a simplified homepage from where the user can navigate to different resources. The
left side pane of the homepage has a site map followed by different categories which narrows
down the possibility of finding the right sequence. On the right side you can see the list of
popular resources which is very useful for first time users

METHODS
 First of all Open the NCBI website https://www.ncbi.nlm.nih.gov .
 Then Select the database and choose the option Nucleotide.

Open data bases

 Enter the gene name as EGFR in search tab and click on search. Select species from the
right-side.
 Click on the Homo sapiens from the taxon bar and the result page of gene is Opened.

Retrieve FASTA
FASTA Sequence:
A gene is the molecular unit of heredity of a living organism. It is the name given to certain
stretches of DNA and RNA that code for a type of protein or a strand of RNA that has some
function in an organism. Knowledge of gene sequences has become indispensable for basic
biological research, other research branches using sequencing, and in many applied fields such
as diagnostics, biotechnology, forensic biology, and biological systematics. In bioinformatics, the
FASTA format is a textual format for representing either nucleotide sequences or peptide
sequences in which nucleotides or amino acids are represented by single-letter codes. The
format also allows for sequence names and comments before sequences. The format originates
from the FASTA software package but has now become a standard in bioinformatics.
 Obtain relevant information about gene and retrieve FASTA format of its sequence by
clicking on the FASTA tab at the left corner.
FASTA Sequence of EGFR gene
CTGGTTGTGCATTTGCTGTGGGTTCCCTCCGGCAGGCGACCTCTCCGCGCTGAGAAGGTTATCCGGATAAC
CAAGTAATTATGTGGTGACAGATCACGGCTCGTGCGTCCGAGCCTGTGGGGCCGACAGCTATGAGATGGA
GGAAGACGGCGTCCGCAAGTGTAAGAAGTGCGAAGGGCCTTGCCGCAAAGTGTGTAACGGAATAGGTAT
TGGTGAATTTAAAGACTCACTCTCCATAAATGCTACGAATATTAAACACTTCAAAAACTGCACCTCCATCAGT
GGCGATC
Practical # 2
Determination of proteins physical and chemical
parameters
Physical Parameters:
A characteristic of matter that may be observed and measured without changing the chemical
Identity of a sample. The measurement of a physical property may change the arrangement of
matter in A sample but not the structure of its molecules.

Chemical Parameters:
Chemical parameters include pH acidity, alkalinity, chlorine, hardness, dissolved oxygen And
biological oxygen demand. Biological parameters include nutrients, bacteria, algae and Viruses.
Water quality parameters are important because different application scenarios will Generally
have different requirements.

Tool used for chemical and physical properties of

protein is Expasy ProtParam
ProtParam
ProtParam is a tool that allows the calculation of various physical and chemical parameters for a
given protein in Swiss-Prot or TrEMBL or for a user-specified protein sequence. Calculated
parameters include molecular weight, theoretical pI, amino acid composition, extinction
coefficient, estimated half-life, instability index, aliphatic index, and grand diameter hydropathy.

METHOD
 Open the Expasy ProtParam website https://web.expasy.org/protparam/ on the
Google
 Then Paste amino acid sequence of EGFR protein( retrieved from NCBI )on box and click
on compute parameters
The result page is appeared which shows different physical and chemical parameters of EGFR
protein such as

 Number of amino acids

 Molecular weight
 Theoretical pI: It means that the protein has no net charge because the positive and
negative charges are equal.
 Amino acid composition
 Total number of negatively and positively charged residues
 Atomic composition
 Formula and total number of atoms
 Extinction coefficient: It is a characteristic that determines how strongly a species
absorbs or reflects radiation or light at a particular wavelength. It is measured in M-1cm-
1
 Instability index: The Instability index is a measure of proteins, used to determine
whether it will be stable in a test tube. If the index is less than 40, then it is probably
stable in the test tube. If it is greater than it is probably not stable.
 Aliphatic index: The relative volume occupied by aliphatic side chains.
 GRAVY: Grand average of hydropathicicty: It is defined as the average hydropathy value
of peptide or protein.

Practical # 3
Finding similar sequence for proteins and DNA
BLAST
BLAST stands for Basic Local Alignment Search Tool. BLAST finds regions of similarity between
biological sequences. The program compares nucleotide or protein sequences to sequence
databases and calculates the statistical significance. BLAST can be used to infer functional and
evolutionary relationships between sequences as well as help identify members of gene
families.

Method:
• Open the BLAST website https://blast.ncbi.nlm.nih.gov/Blast.cgi in web browser
 Select the Nucleotide BLAST and the blastn suit page is Opened.

 Paste the FASTA sequence of EGFR in the tab and also add job title in the respective bar
then click on blast option .
results of blast are as follows
1. Description
2. Graphic summary
3. Alignment
4. Taxonomy

 Description: In this result, a list of similar sequences are arranged in ascending order.
Query coverage: Query cover is the percentage of the query sequence that overlaps the
reference sequence
Percentage identity: Percent identity is the % of bases that are identical to the reference
genome.
E value: "E-value" (Expect Value) is a statistical measure that represents the expected
number of random alignments that would have a score equal to or better than the one
obtained in the search, purely by chance.
 Graphic Design: A graphical summary in the context of a BLAST report typically refers to
a visual representation of the sequence comparison between the query sequence and
database sequences found to be similar.
 Alignment: In BLAST, alignment refers to the process of finding and displaying regions of
similarity between two or more sequences. The primary purpose of alignment in BLAST
is to identify regions where the query sequence and a sequence from a database (or
multiple sequences) share similarity, which can provide insights into potential homology,
functional conservation, or evolutionary relationships.

 Taxonomy: In BLAST, taxonomy plays an important role in helping to identify the

evolutionary relationships and origin of sequences that match the query sequence in the
search. Taxonomy in BLAST is used to classify and organize sequences in a database
based on their evolutionary history and relationships.
Practical # 4
Multiple Alignment

Tool : Clustal W
ClustalW is a widely used system for aligning any number of homologous nucleotide or protein
sequences. For multi-sequence alignments, ClustalW uses progressive alignment methods. In
these the most similar sequences that is those with the best alignment score are aligned first.
Then progressively more distant groups of sequences are aligned until a global alignment is
obtained.

Method of multiple Alignment

 Open the CLUSTAL W website https://www.genome.jp/tools-bin/clustalw in the
Google.
 Make the file of different FASTA sequences in word to get their multiple alignment. Each
sequence should start with a '>' character followed by a sequence identifier and then
the sequence itself.

 Then enter the multiple sequences in the bar and then click the Execute Multiple
Alignment. Output page is open
Result Interpretation
 Conserved Regions: Positions where most or all sequences
have the same nucleotide (A, T, C, G). These positions
indicate conserved regions. Conserved regions are often
biologically significant, as they may represent functional
domains or important structural elements in the
sequences.
 Gaps in the alignment are represented by "-" characters.
They indicate insertions or deletions (indels) in the
sequences. The length and position of gaps can vary. Long
gaps may suggest significant sequence differences or
structural variations.
 Consensus Sequence: Some alignment files may include a
consensus sequence, which represents the most common
nucleotide at each position in the alignment. It is typically
denoted by symbols such as "*" or ":" to indicate different
levels of conservation.
 Phylogenetic analysis: The Multiple sequence alignment
can be used as input for phylogenetic analysis to infer
evolutionary relationships between sequences. Tools like
phylogenetic trees can be generated to visualize these
relationships based on the alignment

PRACTICAL#5
Predicting Proteins Secondary Structure
Tool : Psipred
PSIPRED works to normalize the sequence profile generated by PSIBLAST. Then, by using neural
networking, initial secondary structure is predicted. For each amino acid in the sequence, the
neural network is fed with a window of 15 acids.

Method :
 Open the Psipred website http://bioinf.cs.ucl.ac.uk/psipred/ on the Google.
 Then enter the FASTA sequence of Gene EGFR in the given bar and also add job name.
 Then press the submit bar
 Results are shown on the screen.
PRACTICAL#6:
Predicting RNA Secondary Structure
Tool used :RNA Fold
RNAfold predicts the consensus structure of a set of aligned DNA or RNA sequences. It extends
standard dynamic programming algorithms for RNA secondary structure prediction by averaging
the energy contributions over all sequences and incorporating covariation terms into the energy
model to reward compensatory mutations and penalize non-compatible base-pairs. Again, it
supports prediction of the minimum free energy structure and base-pairing probabilities and
can handle circular sequences. The input is a single multiple sequence alignment in CLUSTAL W
or FASTA format. There are only two additional parameters compared to the RNAfold server,
namely ‘Weight of covariance term’ and ‘Penalty for non-compatible sequences’ which affect
the covariance scoring schema and the penalization of non-compatible base-pairs of the
RNAalifold algorithm. The output is similar to that of the RNAfold server, but also features a
structure annotated alignment. Plots are augmented by a special coloring schema that indicates
compensatory mutations. Note that the more mutations are observed that support a certain
base-pair, the more evidence is given that this base-pair might be correctly predicted.

Method:
 Open the website RNAfold
http://rna.tbi.univie.ac.at/cgi-bin/RNAWebSuite/RNAfold.cgi on the Google’s search bar

 Then enter the RNA sequence of EGFR

 mRNA sequence of EGFR protein is obtained from NCBI website.

 Then click on the proceed bar to get results

PRACTICAL#8:
Finding Protein Families
Tool used: Pfam
Pfam is a database of protein families that includes their annotations and multiple sequence
alignments generated using hidden Markov models. The most recent version, Pram 35.0, was
released in November 2021 and contains 19,632 families.

Features:
For each family in Pfam one can:
● View a description of the family
● Look at multiple alignments
● View protein domain architectures
● Examine species distribution
● Follow links to other databases
● View known protein structures

Method
 Open Pfam website from the Google by using http://pfam-legacy.xfam.org/.
 Then add assession no of protein and click on go tab

Practical # 11
Primer design

Objectives of primer designing

The objective of primer design is straightforward: to determine a set of
forward the reverse primers that will amplify one group of sequences
(the target) but no others (the non-targets)
Tool used
Primer3Plus
Here we present Primer3Plus, a new web interface to the popular
Primer3 primer design program as an enhanced alternative for the CGI-
scripts that come with Primer3. Primer3 consists of a command line
program and a web interface. The web interface is one large form
showing all of the possible options. This makes the interface powerful,
but at the same time confusing for occasional users. Primer3Plus
provides an intuitive user interface using present-day web technologies
and has been developed in close collaboration with molecular biologists
and technicians regularly designing primers.
Method :
 Open Primer3Plus website on Google.
 Then to add FASTA sequence on tab bar we will retrieved FASTA
sequence by using NCBI website.
 Now enter the retrieved sequence in the source sequence tab and
pick Primer.
OligoCalc:
 Open OligoCalc

Cellular and Molecular Pharmacology
From Everand
Cellular and Molecular Pharmacology
Dr. Amteshwar Singh Jaggi
4.5/5 (6)
Module in Tics
No ratings yet
Module in Tics
20 pages
Bioinformatics Tools: Stuart M. Brown, PH.D Dept of Cell Biology NYU School of Medicine
No ratings yet
Bioinformatics Tools: Stuart M. Brown, PH.D Dept of Cell Biology NYU School of Medicine
50 pages
Bioinformatics Module.docx
No ratings yet
Bioinformatics Module.docx
8 pages
BTH 403-BTG407 PRACTICAL SESSION1
No ratings yet
BTH 403-BTG407 PRACTICAL SESSION1
12 pages
FASTA
No ratings yet
FASTA
33 pages
I Am Sharing 'Document (2) ' With You
No ratings yet
I Am Sharing 'Document (2) ' With You
36 pages
Is To Be Acquaint With Sequence Analysis Tools That Can Be Accessed Through The Internet Specifically Working The NCBI Database
No ratings yet
Is To Be Acquaint With Sequence Analysis Tools That Can Be Accessed Through The Internet Specifically Working The NCBI Database
3 pages
Bioinformatics (Database Uses)
No ratings yet
Bioinformatics (Database Uses)
18 pages
Introduction to Bioinformatics Using Action Labs
From Everand
Introduction to Bioinformatics Using Action Labs
Jean-Louis Lassez
5/5 (1)
Using BLAST: FASTA Format
0% (1)
Using BLAST: FASTA Format
3 pages
Practical 2 sequence alignment
No ratings yet
Practical 2 sequence alignment
8 pages
Bioinformatics Lab 2
No ratings yet
Bioinformatics Lab 2
9 pages
Bioinfo Final Practical
No ratings yet
Bioinfo Final Practical
66 pages
Bioinformatics Lab 2 (Evelyn)
No ratings yet
Bioinformatics Lab 2 (Evelyn)
9 pages
23msc02001 Cb Journal
No ratings yet
23msc02001 Cb Journal
34 pages
3- introduction(SEQU ANAL of PCR products 9 9 12 (2)
No ratings yet
3- introduction(SEQU ANAL of PCR products 9 9 12 (2)
42 pages
About Basic Local Alignment Search Tool
No ratings yet
About Basic Local Alignment Search Tool
17 pages
Blast Introduction
No ratings yet
Blast Introduction
42 pages
Bioinformatics Lab 1
0% (1)
Bioinformatics Lab 1
4 pages
بحث المعلوماتية الحيوية
No ratings yet
بحث المعلوماتية الحيوية
39 pages
Bioinformatics 3 vedant
No ratings yet
Bioinformatics 3 vedant
7 pages
Blast Introduction
No ratings yet
Blast Introduction
42 pages
BI205 Prac 5&6
No ratings yet
BI205 Prac 5&6
11 pages
Biology 171L - General Biology Lab I Lab 12: Introduction To Bioinformatics
No ratings yet
Biology 171L - General Biology Lab I Lab 12: Introduction To Bioinformatics
6 pages
Blast
No ratings yet
Blast
6 pages
W9-SIO1003 Practical 4-Questions
No ratings yet
W9-SIO1003 Practical 4-Questions
6 pages
Exercise 7 Bioinformatics
No ratings yet
Exercise 7 Bioinformatics
8 pages
lecture1_BIOF242_shuvadeep
No ratings yet
lecture1_BIOF242_shuvadeep
38 pages
Part 1: Your First BLAST Search
No ratings yet
Part 1: Your First BLAST Search
24 pages
BLAST - Practic Information
No ratings yet
BLAST - Practic Information
2 pages
Practical
No ratings yet
Practical
9 pages
Project MEGA Protocol
No ratings yet
Project MEGA Protocol
5 pages
Bioinformatics Exercises Print
No ratings yet
Bioinformatics Exercises Print
6 pages
Practical Lab Exercise for Intro Bioinf II (2)
No ratings yet
Practical Lab Exercise for Intro Bioinf II (2)
29 pages
BI Lab Manual(18-19)
No ratings yet
BI Lab Manual(18-19)
21 pages
Bio Tools Booklet
No ratings yet
Bio Tools Booklet
5 pages
Blast Tips
No ratings yet
Blast Tips
6 pages
bioinfomatic 4
No ratings yet
bioinfomatic 4
3 pages
Bioinformatics: Blast and Sequence Analysis
No ratings yet
Bioinformatics: Blast and Sequence Analysis
45 pages
Semwork 1
No ratings yet
Semwork 1
19 pages
Diploma - Practical
No ratings yet
Diploma - Practical
11 pages
Week2 BlastTutorial
No ratings yet
Week2 BlastTutorial
11 pages
Bioinformatic Practical
No ratings yet
Bioinformatic Practical
19 pages
Biology essay _ genome
No ratings yet
Biology essay _ genome
10 pages
blast_lab_2019
No ratings yet
blast_lab_2019
10 pages
Plant Biotechnology
No ratings yet
Plant Biotechnology
44 pages
BLAST
No ratings yet
BLAST
30 pages
Bio PPT
No ratings yet
Bio PPT
35 pages
Lecture 05
No ratings yet
Lecture 05
36 pages
Bioinformatics Database and Applications
100% (3)
Bioinformatics Database and Applications
82 pages
Bioinformatics Assingment - B8.Docx Alex Presly-37
No ratings yet
Bioinformatics Assingment - B8.Docx Alex Presly-37
10 pages
Lab Report 03
No ratings yet
Lab Report 03
18 pages
Bi Workbook
No ratings yet
Bi Workbook
13 pages
Lecture_3
No ratings yet
Lecture_3
55 pages
Blast
100% (1)
Blast
21 pages
Bioinformatics: Merging Biology and Technology
From Everand
Bioinformatics: Merging Biology and Technology
Mani Devar
No ratings yet
Gene Control: Unlocking Genetic Secrets
From Everand
Gene Control: Unlocking Genetic Secrets
Deevakar Asan
No ratings yet
Fast Facts: Comprehensive Genomic Profiling: Making Precision Medicine Possible
From Everand
Fast Facts: Comprehensive Genomic Profiling: Making Precision Medicine Possible
Bernardo L. Rapoport
5/5 (1)
Bioinformatics Unveiled
From Everand
Bioinformatics Unveiled
Joan Melody
No ratings yet
Apuntes
No ratings yet
Apuntes
3 pages
HEC Pakistan - Recommendations - 28 - JULY - 2021
No ratings yet
HEC Pakistan - Recommendations - 28 - JULY - 2021
160 pages
201_Human Genome Project - Whole Genome Shotgun Sequencing
No ratings yet
201_Human Genome Project - Whole Genome Shotgun Sequencing
13 pages
1 Genomik - Proteomik
No ratings yet
1 Genomik - Proteomik
17 pages
Databases and Ontologies
No ratings yet
Databases and Ontologies
1 page
Upgma: Presented by Shreya Gopinath
No ratings yet
Upgma: Presented by Shreya Gopinath
17 pages
Gene Finding
No ratings yet
Gene Finding
31 pages
15GN402L_final_bioinformatics_lab_manual (1)
No ratings yet
15GN402L_final_bioinformatics_lab_manual (1)
68 pages
Bioinformatics Lab Notebook: Comsats University, Islamabad
No ratings yet
Bioinformatics Lab Notebook: Comsats University, Islamabad
27 pages
Bioinformatics and Computational Biology
100% (2)
Bioinformatics and Computational Biology
21 pages
Blast
No ratings yet
Blast
12 pages
EMBL Presentation (Twisha) (1)
No ratings yet
EMBL Presentation (Twisha) (1)
22 pages
Laboratory Report On Practical 2 - Multiple Sequence Allignment (A184381)
No ratings yet
Laboratory Report On Practical 2 - Multiple Sequence Allignment (A184381)
8 pages
A Draft Human Pangenome Reference: Article
No ratings yet
A Draft Human Pangenome Reference: Article
63 pages
Final Course List (Jan - Apr 2025)
No ratings yet
Final Course List (Jan - Apr 2025)
132 pages
Neighbor Joining
No ratings yet
Neighbor Joining
5 pages
Bac Met
No ratings yet
Bac Met
2 pages
Bioinformatics Lab Drosophila Handout for STUDENTS (2)
No ratings yet
Bioinformatics Lab Drosophila Handout for STUDENTS (2)
16 pages
Mastering Bioinformatics and Computational Biology_ Unraveling the Complexities of Life Through Data-Driven Discovery
100% (1)
Mastering Bioinformatics and Computational Biology_ Unraveling the Complexities of Life Through Data-Driven Discovery
216 pages
BGISEQ-500 WGS Demo Report en
No ratings yet
BGISEQ-500 WGS Demo Report en
17 pages
An Introduction To Patterns, Profiles, Hmms and Psi-Blast
No ratings yet
An Introduction To Patterns, Profiles, Hmms and Psi-Blast
92 pages
Contoh Jurnal Internasional
No ratings yet
Contoh Jurnal Internasional
14 pages
Biological_Databases
No ratings yet
Biological_Databases
15 pages
Bscol 7
No ratings yet
Bscol 7
29 pages
BIF401 MID Term Exam 2022 Preparation by BADSHA ALI
No ratings yet
BIF401 MID Term Exam 2022 Preparation by BADSHA ALI
6 pages
QB Test
No ratings yet
QB Test
19 pages
PAM and BLOSUM
No ratings yet
PAM and BLOSUM
21 pages
Bioinformatics: A Practical Guide to the Analysis of Genes and Proteins 4th Edition Andreas D. Baxevanis - Quickly download the ebook in PDF format for unlimited reading
100% (1)
Bioinformatics: A Practical Guide to the Analysis of Genes and Proteins 4th Edition Andreas D. Baxevanis - Quickly download the ebook in PDF format for unlimited reading
59 pages
Question Bank (Bioinformatics I)
No ratings yet
Question Bank (Bioinformatics I)
75 pages
Ten Simple Rules For Reading A Scientific Paper: Computational Biology
No ratings yet
Ten Simple Rules For Reading A Scientific Paper: Computational Biology
6 pages

BI Manual

Uploaded by

BI Manual

Uploaded by

Name : Fazila Fatima

Practical # 1 : Retrieval of FASTA sequence

Retrieving FASTA sequence for nucleotide

Open data bases

Tool used for chemical and physical properties of

 Number of amino acids

 Taxonomy: In BLAST, taxonomy plays an important role in helping to identify the

Method of multiple Alignment

 Then enter the RNA sequence of EGFR

 Then click on the proceed bar to get results

Objectives of primer designing

You might also like