0% found this document useful (0 votes)

29 views

2 Practical of Basic Bioinformatics Module: 2.1. Uniprotkb/Swiss-Prot

The document discusses searching and viewing information from the UniProt database. It provides steps to search UniProt for a protein, view the Swiss-Prot entry, and describes the various sections of information contained within a Swiss-Prot entry such as names, attributes, annotations, ontologies, sequences and references.

Uploaded by

alem010

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

29 views

2 Practical of Basic Bioinformatics Module: 2.1. Uniprotkb/Swiss-Prot

Uploaded by

alem010

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

2nd Practical of Basic bioinformatics module

2.1. UniProtKB/Swiss-Prot
The UniProt Knowledgebase (UniProtKB) is the central hub for the collection of functional
information on proteins, with accurate, consistent and rich annotation. In addition to capturing the
core data mandatory for each UniProtKB entry (mainly, the amino acid sequence, protein name
or description, taxonomic data and citation information), as much annotation information as
possible is added. This includes widely accepted biological ontologies, classifications and crossreferences, and clear indications of the quality of annotation in the form of evidence attribution of
experimental and computational data.
The UniProt Knowledgebase consists of two sections: a section containing manually-annotated
records with information extracted from literature and curator-evaluated computational analysis,
and a section with computationally analysed records that await full manual annotation. For the
sake of continuity and name recognition, the two sections are referred to as "UniProtKB/SwissProt" (reviewed, manually annotated) and "UniProtKB/TrEMBL" (unreviewed, automatically
annotated), respectively.
Manual annotation consists of a critical review of experimentally proven or computer-predicted
data about each protein, including the protein sequences. Data are continuously updated by an
expert team of biologists.
We will now see how to search the UniProtKB, and which information contains a
UniProtKB/Swiss-Prot entry.
Step 1. Enter http://www.uniprot.org/ in your browser search tab, or just click on the link.
The main page of the UniProt shows up (Figure 1). The UniProt main page is well organized and
helpful. There are several important things on the start page that one can use to find exactly what
he/she needs.
For now, we will not deal in details with possibilities and tools on the UniProt web page. We will
just search the databases with our old example Pho5, and see what information can be found in
the Swiss-Prot entry.
Step 2. Type Pho5 in the search bar and click on Search.
There are 21 matches to the Pho5 query (possibly more by the time this practical is done). The
results are displayed as in the Figure 2. The results are retrieved from both TrEMBL and SwissProt sections. You can see in which section the entry belongs by the star symbol (see Figure 2,
red label). The entries with the yellow star belong to the Swiss-Prot section, whereas the one
without it belong to the TrEMBL section. There are several ways to retrieve just the reviewed
(Swiss-Prot) entries.

Available tools
Search bar

News feed

Video tutorial
What can be
found here

Detailed help

Figure 1. The UniProt main page. The most important components are labelled in red.

Figure 2. The UniProt result page. Labelled in red is the section status bar, labelled in green are section filter and
other filters, and labelled in blue is the Download link.

If you need to retrieve just the reviewed you can do it by performing the search and the click
Show only reviewed link on the result page (Figure 2, green label). Another way to retrieve just
Swiss-Prot entries is to use Advanced search settings.
The retrieved entries can be downloaded in several ways. You just have to select the entries you
want to download press the Download link (Figure 2, labelled blue) and choose the output format
of wanted entries. For example, you can download the Accession number list or sequences in
FASTA format of the chosen entries.
Step 3. Find the Pho5 protein (repressible acid phosphatase) from the organism
Saccharomyces cerevisiae of the retrieved entries, and click on the Entry link.
After doing this step, the chosen entry is displayed. We will now have a detailed look on the
Swiss-Prot entry.
There are ten different sections in the Swiss-Prot entry. Each section has one or more subsections
which contain entry specific information. Because of the great number of sections and
subsections, one can get lost in the aboundacy of information. Thats why clicking on the name
of section or subsections, the help windows pop-out with description of it.
We will now breafly look at the sections present in the Swiss-Prot entry, and get some general
picture of the information found in the database about concerned protein.
The first three sections are Names and origin, Protein attributes, and General annotation
(comments), shown in Figure 3. It contain basic information about the protein.

Figure 3. The sections Names and origin, Protein attributes, and General annotation (comments) for the Pho5
protein

Figure 4. The Ontologies section

Figure 5. The Sequence annotation (Features) section

The Name and origin section, as the name itself says, contain the name of the protein with
alternative names, including E.C. number (recall the Biochemistry I class for definition of E.C.
number). It also includes the name of the gene coding for the protein, as well as the organism of
origin with the taxonomic lineage. Note that there are links to the UniProt Taxonomy database to
each Taxonomy class. You can access the taxonomic data on the UniProt Taxonomy database
and the NCBI Taxonomy database using the NCBI unique identifier (taxonomic identifier).
The following section Protein attributes contains some useful information about the entry. In
the case of Pho5, you can see that the protein has the length of 467 aa, was completely
sequenced, and the existence of protein was confirmed by one or more analytical methods. You
can also see that this protein is subjected to post-translational modification.
The third section General annotation (Comments) generally contains biochemically important
information, for example type of reaction yielded by the enzyme, post-translational
modifications, cell compartment etc. Note that there are yellow links at the end of certain
subsections. Those links, which looks like Ref. [0-9], links to the section References, where
publications in which the information is found can be retrieved.

Figure 6. The Sequence section

The next section is the Ontologies section (Figure 4). In information science, ontology
generally refers as a set of concepts within a domain, and the relationships between those
concepts. It can be used to reason about the entities within that domain and may be used to
describe the domain. An example is Gene Ontology. The Gene Ontology project is a major

bioinformatics initiative with the aim of standardizing the representation of gene and gene
product attributes across species and databases. The project provides a controlled vocabulary of
terms for describing gene product characteristics and gene product annotation data from GO
Consortium members, as well as tools to access and process this data. In this section the
keywords for specific field can be found, as well as the entry from the Gene Ontology project
database.
The section Sequence Annotation (Figure 5.) provides a precise but simple means for the
annotation of sequence data. It describes regions or sites of interest in the protein sequence. In
general this section lists post-translational modifications, binding sites, enzyme active sites, local
secondary structure or other characteristics reported in the cited references. Sequence conflicts
between references are also included in this section.
The section Sequence (Figure 6.) displays by default the canonical protein sequence and upon
request all isoforms described in the entry. It also includes information pertinent to the
sequence(s), including length and molecular weight. The protein sequence displayed by default is
the protein sequence to which all positional annotation of the Sequence annotation section
refers. It is called the canonical sequence. Note that this section also includes various tools
which can be used to analyse the sequence (e.g. BLAST, Compute pI, MW, etc.). The sequence
can be easily exported as a FASTA sequence just by clicking on the FASTA link.
There are few more sections. The Reference section, as mentioned before contains the list of
publication, as well as the links to those publications, from which information for the annotation
was retrieved. Each reference has a part which explains which information is gained from that
publication.
The Cross-reference section provides links and unique identifiers which points to collections
and/or databases other than UniProtKB.
And at the end, there are the sections Entry information and Relevant documents. In the
Entry information section, the information like entry submission time, last modify time,
accession number etc., can be found. The section Relevant documents contains links to
documents relevant to the entry (for example genome sequence and annotation, protein family,
etc.).

2.2. KEGG
Kyoto Encyclopedia of Genes and Genomes (KEGG) is a database resource that integrates
genomic, chemical and systemic functional information. In particular, gene catalogs from
completely sequenced genomes are linked to higher-level systemic functions of the cell, the
organism and the ecosystem. Major efforts have been undertaken to manually create a knowledge
base for such systemic functions by capturing and organizing experimental knowledge in
computable forms; namely, in the forms of KEGG pathway maps, BRITE functional hierarchies
and KEGG modules. Continuous efforts have also been made to develop and improve the crossspecies annotation procedure for linking genomes to the molecular networks through the KEGG
Orthology system.
Step 1. Enter http://www.genome.jp/kegg/ in your browser search tab, or just click on the
link.
The KEGG main page shows up (Figure 1). KEGG offers a wide variety of options and
information. In this practical we will consider just a portion of all KEGG possibility which could
be of most interest. KEGG contains a database of metabolite pathways of wide variety of
organisms which can be explored.

Figure 1. The KEGG main page

Step 2. Click on the KEGG PATHWAY link on the KEGG main page, then in the section 0.
Global map click the Metabolic pathways link.
A figure as in Figure 2 appears. This figure represents the reference metabolic pathway. The dots
on the figure represent the metabolites, whereas the lines represent metabolic reactions. By
pointing your mouse cursor on the dot, the unique identifier with the picture of the compound is
shown (Figure 3a). Clicking on the dot, the entry of the concerned metabolite is shown. By
pointing on the line, identifiers to the concerned reaction are shown. Those identifiers are given
to enzymes carrying the reaction (EC numbers), orthologous genes for that reaction, the reaction
itself (Figure 3b). By clicking on the line representing the reaction, it is possible to see the entries
involved in this reactions, which are the entries linked in previously mentioned unique identifiers.

Figure 2. The gloabal metabolic pathway

Step 3. In the scroll down menu find the organism Rickettsia prowazekii, select it and click
on the Go button.
A figure a little bit different than Figure 2 is displayed (Figure 4). Rickettsia prowazekii is a
obligate intracellular parasitic, aerobic bacteria that is the etiologic agent of epidemic typhus,
transmitted in the feces of lice. Because it is a obligate parasite, it lacks many metabolic
pathways. The grey dots and lines on Figure 4 represent the reactions and metabolites which this
bacterium lacks.

Figure 3. a) Figure of the metabolite pointed in red and b) unique idetifiers of entries involved in the reaction signed
in red

Now that we have seen how metabolic pathways can be retrieved and explored for specific
organism, now it's time to see how genes and/or proteins are represented in KEGG. First we have
to return to the KEGG main page.

Figure 4. Rickettsia prowazekii metabolic pathway

Step 4. On the KEGG main page enter Pho5 in the search bar, and then click on the
sce:YBR093C.
The link you have just clicked on is the link to the Saccharomyces cerevisiae PHO5 gene. There
were also other PHO5 genes from other organisms retrieved, but as you already know, we are
interested particularly in PHO5 from Saccharomyces cerevisiae. The page is displayed as in
Figure 5.

As you can see, the entry is pretty simple, but it is as well very complex because it contains a lot
of links and references. It contains standard sections, as gene name, definition with EC number,
organism, references to other databases, protein sequence, nucleotide sequence, position etc. We
will focus on the Pathway section. An also interesting section, but not the focus of our present
exercise, is the Orthology section. The Nucleotide section is also interesting because, not
only that contains the DNA sequence which can be exported in FASTA format, but contains an
option which enables you to retrieve not just the gene sequence itself, but also n nucleotides
downstream or upstream of the gene.

Figure 5. The sce:YBR093C entry

Step 5. Click on the sce00740 link in the Pathway section.

The Pathway section contains information about the function of the gene. The product of the
gene can be included in metabolism, regulation of metabolic processes, or in regulation of gene
expression. The product of the PHO5 gene is included in the riboflavin metabolism. The page we
have come in Step 5 contains the riboflavin metabolic pathway. The product of the PHO5 gene is
signed in red (Figure 6). This interactive figure has the same functionality as the global metabolic
pathway figure, but because of less data shown, the figure is more detailed, which means that

under dots there are actually names of corresponding metabolites, and on the reaction lines there
are the EC numbers for the enzyme which carries the reaction. If the enzyme is available in
KEGG, the EC number links to this enzyme from the concerned organism. Available enzymes
are highlighted in green. By clicking on dot representing the metabolite, it is possible to get to the
entry of that metabolite.
Now we will return to the YBR093C entry to analyse it further.

Figure 6. The riboflavin metabolism pathway. Pho5 is labelled in red.

Step 6. On the YBR0093C entry page, click on the sce04111 link located in the Pathway
section.
By doing so, the portion of yeasts cell cycle regulation pathway is shown (Figure 7). This map
shows which proteins are involved in the yeast cell cycle. There are proteins which regulate the
activity of other proteins, transcription factors, etc. Some proteins are clustered together, which
means that those proteins form complexes with each other. As on the pathway figure, the Pho5
gene is marked in red, available genes are also highlighted green and their entries can be retrieved
by clicking on them. The lines represent the influence of one protein to other. There are specific
lines indicating repression, activation, phosphorylation, etc. Because the lines dont represent
reactions in the usual sense of the word, they dont link to anything.

Figure 7. The yeast cell cycle. Pho5 is labelled in red.

Module in Tics
No ratings yet
Module in Tics
20 pages
Nelson Biology Answers 4
No ratings yet
Nelson Biology Answers 4
2 pages
Uniprotkb Quickguide
No ratings yet
Uniprotkb Quickguide
2 pages
Bioinformatics Day4
No ratings yet
Bioinformatics Day4
5 pages
I Hate This Website
No ratings yet
I Hate This Website
4 pages
Uniprot Flyer
No ratings yet
Uniprot Flyer
4 pages
Bioinformatics Unit I
No ratings yet
Bioinformatics Unit I
6 pages
note 2
No ratings yet
note 2
54 pages
Fat Noews Docx (3)
No ratings yet
Fat Noews Docx (3)
16 pages
GROUP 2
No ratings yet
GROUP 2
24 pages
Seminari 3- Analisis Estructura Proteines
No ratings yet
Seminari 3- Analisis Estructura Proteines
56 pages
The Universal Protein Resource (Uniprot) 2009
No ratings yet
The Universal Protein Resource (Uniprot) 2009
6 pages
E1 - Biological Databases and Data Organization: General Content
No ratings yet
E1 - Biological Databases and Data Organization: General Content
2 pages
Uni Prot
No ratings yet
Uni Prot
6 pages
Introduction To Databases - NCBI, PDB and Uniprot
No ratings yet
Introduction To Databases - NCBI, PDB and Uniprot
5 pages
UniPort The Universal Protein Knowledgebase Summary KhZ01
No ratings yet
UniPort The Universal Protein Knowledgebase Summary KhZ01
3 pages
4.2
No ratings yet
4.2
18 pages
Uniprot Webinar Oct 2022
No ratings yet
Uniprot Webinar Oct 2022
48 pages
gkaa1100
No ratings yet
gkaa1100
10 pages
The Universal Protein Resource (Uniprot) : An Expanding Universe of Protein Information
No ratings yet
The Universal Protein Resource (Uniprot) : An Expanding Universe of Protein Information
6 pages
Uniprot: The Universal Protein Knowledgebase
No ratings yet
Uniprot: The Universal Protein Knowledgebase
12 pages
Adv Bi Unit 1
No ratings yet
Adv Bi Unit 1
39 pages
selected topic in cs 1 (3)
No ratings yet
selected topic in cs 1 (3)
53 pages
4Bioinformaticsdatabases
No ratings yet
4Bioinformaticsdatabases
71 pages
BI Lab Manual(18-19)
No ratings yet
BI Lab Manual(18-19)
21 pages
Computational Biology B.Tech - Biotech (Vith Semester)
No ratings yet
Computational Biology B.Tech - Biotech (Vith Semester)
34 pages
Bioinformatics Database
No ratings yet
Bioinformatics Database
50 pages
Databases Exercise
No ratings yet
Databases Exercise
3 pages
Bioinformatics (STH Sir)
No ratings yet
Bioinformatics (STH Sir)
13 pages
Bioinformatics Lab 1
0% (1)
Bioinformatics Lab 1
4 pages
Presentation 11
No ratings yet
Presentation 11
20 pages
Lab Report 1 Bioinformatics
No ratings yet
Lab Report 1 Bioinformatics
13 pages
Proteins Practical Sample Viva questions
No ratings yet
Proteins Practical Sample Viva questions
4 pages
Fat Noews Docx (2)
No ratings yet
Fat Noews Docx (2)
32 pages
Lecture_3
No ratings yet
Lecture_3
55 pages
Unlocking The Mysteries of UNIPROT
No ratings yet
Unlocking The Mysteries of UNIPROT
10 pages
Biological Databases
No ratings yet
Biological Databases
39 pages
Practical 2 - Ncbi
No ratings yet
Practical 2 - Ncbi
3 pages
Protein Databases
No ratings yet
Protein Databases
8 pages
Protein Databases
No ratings yet
Protein Databases
12 pages
FALLSEM2019-20 BIT2001 ETH VL2019201000690 Reference Material I 11-Jul-2019 Unit I New
No ratings yet
FALLSEM2019-20 BIT2001 ETH VL2019201000690 Reference Material I 11-Jul-2019 Unit I New
48 pages
Unit II Major Databases in Bioinformatics
No ratings yet
Unit II Major Databases in Bioinformatics
54 pages
Coursera BioinfoMethods-II Lab03
No ratings yet
Coursera BioinfoMethods-II Lab03
10 pages
Lista de Bases de Datos
No ratings yet
Lista de Bases de Datos
13 pages
Mulder 2007
No ratings yet
Mulder 2007
13 pages
Uniprotkb Tutorial DJL 2011-10-28
No ratings yet
Uniprotkb Tutorial DJL 2011-10-28
12 pages
Protein Seq Databases (1)
No ratings yet
Protein Seq Databases (1)
20 pages
Expert Protein Analysis System: Expasy
100% (1)
Expert Protein Analysis System: Expasy
14 pages
BCH 505 Bioinformatics 3(2 2) Databases
No ratings yet
BCH 505 Bioinformatics 3(2 2) Databases
17 pages
Alignments Lecture
No ratings yet
Alignments Lecture
15 pages
Database Dalam Bioinformatika
No ratings yet
Database Dalam Bioinformatika
34 pages
Serves List
100% (1)
Serves List
34 pages
Module 2 Biodata
No ratings yet
Module 2 Biodata
36 pages
Bioinformatics Day2
No ratings yet
Bioinformatics Day2
3 pages
Bioinformatics Question Bank for FAT
No ratings yet
Bioinformatics Question Bank for FAT
53 pages
Bioinformatics Manual
No ratings yet
Bioinformatics Manual
117 pages
Lab 1
No ratings yet
Lab 1
39 pages
Introduction to Bioinformatics Using Action Labs
From Everand
Introduction to Bioinformatics Using Action Labs
Jean-Louis Lassez
5/5 (1)
Python For Data Science
From Everand
Python For Data Science
Kevin Clark
No ratings yet
Bioinformatics: Merging Biology and Technology
From Everand
Bioinformatics: Merging Biology and Technology
Mani Devar
No ratings yet
Bioinformatics: Algorithms, Coding, Data Science And Biostatistics
From Everand
Bioinformatics: Algorithms, Coding, Data Science And Biostatistics
Rob Botwright
No ratings yet
Optimization of in Vitro Multiplication For Exotic Banana (Musa SPP.) in Pakistan
No ratings yet
Optimization of in Vitro Multiplication For Exotic Banana (Musa SPP.) in Pakistan
7 pages
Add To Meat
No ratings yet
Add To Meat
31 pages
Antimicrobial Peptides From Milk Proteins A Prospectus
No ratings yet
Antimicrobial Peptides From Milk Proteins A Prospectus
6 pages
Partial Purification and Characterization of Amylase Enzyme Under Solid State Fermentation From Monascus Sanguineus
No ratings yet
Partial Purification and Characterization of Amylase Enzyme Under Solid State Fermentation From Monascus Sanguineus
7 pages
Bioactive Compounds in Plants PDF
No ratings yet
Bioactive Compounds in Plants PDF
253 pages
Bio Pest Cide
No ratings yet
Bio Pest Cide
7 pages
Isolation and Characterization of Rapid Cellulose Degrading Fungal Pathogens From Compost of Agro Wastes
No ratings yet
Isolation and Characterization of Rapid Cellulose Degrading Fungal Pathogens From Compost of Agro Wastes
4 pages
BioInformatics Lab2 Karsten
No ratings yet
BioInformatics Lab2 Karsten
5 pages
Genetic Modification or Genetic Engineering
No ratings yet
Genetic Modification or Genetic Engineering
32 pages
Bioinformatics Workshop LDH Worksheet-1
No ratings yet
Bioinformatics Workshop LDH Worksheet-1
4 pages
Bi Workbook
No ratings yet
Bi Workbook
13 pages
Bioinformatics Database Worksheet
No ratings yet
Bioinformatics Database Worksheet
10 pages
340
100% (1)
340
17 pages
Lab Manual Bioinformatics Laboratory (Bt2308) V Semester B.Tech Degree Programme Department of Biotechnology
No ratings yet
Lab Manual Bioinformatics Laboratory (Bt2308) V Semester B.Tech Degree Programme Department of Biotechnology
28 pages
Rasmol Assignment
No ratings yet
Rasmol Assignment
3 pages
Stem Cells Nikola Jezdic
No ratings yet
Stem Cells Nikola Jezdic
16 pages
General Information On Cancer: Accumulation of Mutations Uncontrolled Cell Division
No ratings yet
General Information On Cancer: Accumulation of Mutations Uncontrolled Cell Division
24 pages
Unit 3
100% (1)
Unit 3
83 pages
Viral Vaccines
100% (1)
Viral Vaccines
43 pages
Evolution, 4th Edition by Futuyma, Douglas pdf download
100% (7)
Evolution, 4th Edition by Futuyma, Douglas pdf download
54 pages
DNA Replication
No ratings yet
DNA Replication
70 pages
GMO
No ratings yet
GMO
13 pages
Activity Sheet - Cell
100% (1)
Activity Sheet - Cell
2 pages
Galapagos Island Case Study
86% (7)
Galapagos Island Case Study
4 pages
BSPH 307 Hospital Pharmacy Practice
No ratings yet
BSPH 307 Hospital Pharmacy Practice
1 page
Notes Lectures 6 And10 CSI
No ratings yet
Notes Lectures 6 And10 CSI
9 pages
Paper: Sample
No ratings yet
Paper: Sample
5 pages
Linkage and Crossing Over 3 Maps
No ratings yet
Linkage and Crossing Over 3 Maps
37 pages
Molecular Biology BT401: Biotechnology
No ratings yet
Molecular Biology BT401: Biotechnology
12 pages
WAJA JPN Perak
100% (3)
WAJA JPN Perak
2 pages
Study of Carabus (Morphocarabus) Rothi Comptus Dejean 1831
No ratings yet
Study of Carabus (Morphocarabus) Rothi Comptus Dejean 1831
6 pages
Summary of Infant Formula Test Results - Dr. Seidler
100% (1)
Summary of Infant Formula Test Results - Dr. Seidler
4 pages
Origin of Life Theories Worksheet 6tha - Ariza
No ratings yet
Origin of Life Theories Worksheet 6tha - Ariza
8 pages
Summary of Dignitas Personae
No ratings yet
Summary of Dignitas Personae
6 pages
Cell Division (Lecture Notes - Student)
No ratings yet
Cell Division (Lecture Notes - Student)
35 pages
Mutation Inquiry Activity Sequence
No ratings yet
Mutation Inquiry Activity Sequence
9 pages
Biotips 2013
No ratings yet
Biotips 2013
3 pages
TRN3888939_35414_8658073_8816155_29-11-2024
No ratings yet
TRN3888939_35414_8658073_8816155_29-11-2024
5 pages
Transcription of Rna
No ratings yet
Transcription of Rna
1 page
Drug Discovery
No ratings yet
Drug Discovery
26 pages
Vaccine Regulations in India - ForumIAS
No ratings yet
Vaccine Regulations in India - ForumIAS
1 page
Everest Pharmacy Lists 2022 PDF
No ratings yet
Everest Pharmacy Lists 2022 PDF
13 pages
Chapter 1-Introduction To Molecular Genomics
No ratings yet
Chapter 1-Introduction To Molecular Genomics
35 pages
Health Surveillance
No ratings yet
Health Surveillance
16 pages

2 Practical of Basic Bioinformatics Module: 2.1. Uniprotkb/Swiss-Prot

Uploaded by

2 Practical of Basic Bioinformatics Module: 2.1. Uniprotkb/Swiss-Prot

Uploaded by

2nd Practical of Basic bioinformatics module

Figure 4. The Ontologies section

Figure 5. The Sequence annotation (Features) section

Figure 6. The Sequence section

Figure 1. The KEGG main page

Figure 2. The gloabal metabolic pathway

Figure 4. Rickettsia prowazekii metabolic pathway

Figure 5. The sce:YBR093C entry

Step 5. Click on the sce00740 link in the Pathway section.

Figure 6. The riboflavin metabolism pathway. Pho5 is labelled in red.

Figure 7. The yeast cell cycle. Pho5 is labelled in red.

You might also like