0% found this document useful (0 votes)
14 views

03 Databases

Uploaded by

mrguochengzong
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views

03 Databases

Uploaded by

mrguochengzong
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 47

Sequence databases

BINF2010

Sara Ballouz, PhD


CSE, UNSW
Slides adapted from Marc Wilkins (BABS)
K17 401A and Bruno Gaeta (CSE)
Learning outcomes
• Discuss the limitations of current
sequence databases
• Describe the general structure of a
sequence database entry
• Discuss the types of data available
in GenBank and UniProt
• Use GenBank and UniProt to retrieve
sequence information
• Be aware of the UniProt API for
retrieving sequence information
programmatically
Sequence data

Life is about information


• DNA and proteins are information-carrying molecules
Sequence data

Reductionist and synthetic


approaches in biology
Biological System
(Organism)
Reductionist Synthetic
Approach Approach
(Experiments) (Bioinformatics)

Building Blocks
(Genes/Molecules)
Kanehisa (2000) Post-genome Informatics
Sequence data

Biology…
not so
simple!

From Attwood and Parry-Smith, 1999


Databases

Databases: terminology
• Database: collection of
information related to a specific
subject (e.g., a phone book)

• Record: an entry in a database


(e.g., your entry in the phone book)

• Field: a component of a record


(e.g., your address & number)
Databases

Databases: types

• Flat-file: store data as text files

• Relational: interconnected tables, use


a database management system
Databases

Flat-file databases: example


Databases

Flat-file databases
Pros Cons
• Easy to put together and • Detailed targeted searching is
distribute difficult
• No need for expensive or • Searching is not efficient
complicated database
management software
Databases

Relational database: example tables


Databases

Relational databases
• Require a Relational Database Management System (RDBMS)
• Queried using SQL (or more commonly, a GUI front-end)

SELECT protab1.protein-name, protab2.protein-sequence


FROM protab1, protab2
WHERE protab1.protein-code = protab2.protein-code
AND protab1.protein-code = ‘P1002’;
Databases

Sequence data in a database


• Primary data
• e.g., DNA sequence, protein sequence, protein 3D structure
coordinates

• Annotations (metadata)
• e.g., Authors, literature references, protein function, organism
of origin, location of coding regions in DNA sequence, etc.
Sequence data

Sequence database record structure


• A sequence database record contains
both sequence and annotations
• Record divided broadly into 3 sections:
• Header
• Feature table
• Sequence
Sequence data

Sequence databases
• Nucleic acids: DNA/RNA: GenBank
• Proteins: UniProt
• Specialized/other:
• Non-coding RNA databases: RNAcentral
• Variation databases: gnomAD
• Cancer genomes: TCGA
Nucleic acid data

Nucleotide sequence database: GenBank


• Part of an international consortium to manage sequence data (INSDC)
• GenBank – NIH, USA
• DDBJ – DNA Database of Japan
• EMBL – European Molecular Biology Laboratory
• A collection of nucleic acid sequence data
• DNA, RNA (mRNA, rRNA, tRNA, microRNA, ncRNA…)
• Some sequences are translated to protein sequence
• Is not actively curated
• Is available to the public at no cost

http://www.ncbi.nlm.nih.gov/Genbank/index.html
Nucleic acid data
Nucleic acid data

GenBank: statistics GenBank: 3,387,240,663,231 bases,


from 251,094,334 reported sequences
Release #261 (June 2024) WGS: 27,900,199,328,333 bases, from
3,380,877,515 reported sequences

https://www.ncbi.nlm.nih.gov/genbank/statistics/
Nucleic acid data

GenBank anatomy: header


LOCUS HUMSOMI 2667 bp DNA linear PRI 13-JAN-1995
DEFINITION Human somatostatin I gene and flanks.
ACCESSION J00306
VERSION J00306.1 GI:338287
KEYWORDS neuropeptide Y; somatostatin; somatostatin I; somatostatin-14;
somatostatin-28.
SOURCE Homo sapiens (human)
ORGANISM Homo sapiens
Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi;
Mammalia; Eutheria; Euarchontoglires; Primates; Haplorrhini;
Catarrhini; Hominidae; Homo.
REFERENCE 1 (bases 1126 to 1368; 2246 to 2605)
AUTHORS Shen,L.P., Pictet,R.L. and Rutter,W.J.
TITLE Human somatostatin I: sequence of the cDNA
JOURNAL Proc. Natl. Acad. Sci. U.S.A. 79 (15), 4575-4579 (1982)
PUBMED 6126875
REFERENCE 2 (bases 1 to 2667)
AUTHORS Shen,L.P. and Rutter,W.J.
TITLE Sequence of the human somatostatin I gene
JOURNAL Science 224 (4645), 168-171 (1984)
PUBMED 6142531
COMMENT Original source text: Human fetal liver DNA, Charon 4A library,
clone pHSI-1-2.7 [2], and pancreatic somatostatinoma tissue,
Nucleic acid data

GenBank anatomy: features


FEATURES Location/Qualifiers
source 1..2667
/organism="Homo sapiens"
/mol_type="genomic DNA"
/db_xref="taxon:9606"
/map="3q28"
prim_transcript 1126..2605
/note="som I mRNA"
CDS join(1231..1368,2246..2458)
/note="preprosomatostatin I"
/codon_start=1
/protein_id="AAA60566.1"
/db_xref="GI:338288"
/translation="MLSCRLQCALAALSIVLALGCVTGAPSDPRLRQFLQKSLAAAAG
KQELAKYFLAELLSEPNQTENDALEPEDLSQAAEQDEMRLELQRSANSNPAMAPRERK
AGCKNFFWKTFTSC"
sig_peptide 1231..1302
/note="prosomatostatin I signal peptide"
mat_peptide 2372..2455
/product="somatostatin-28 peptide"
mat_peptide 2414..2455
/product="somatostatin-14 peptide"
gene 1231..1368
/gene="SST"
exon <1231..1368
/gene="SST"
/note="preprosomatostatin I; G00-119-604”
Nucleic acid data

GenBank anatomy: sequence


ORIGIN Chromosome 3q28; 1 bp upstream of EcoRI site.
1 gaattcaagg acaggttttc ttaaactttc tttgtttcta ggagatcagg cagagctgaa
61 tttaaccaag aatcttttga tcctttccac atatagatat acaatagtgg tcacatatgt
121 tctgggagtt cctagacctt atatgtctaa actggggctt cctgacataa aactatgctt
181 accggcagga atctgttaga aaactcagag ctcagtagaa ggaacactgg ctttggaatg
241 tggaggtctg gttttgctca aagtgtgcag tatgtgaagg agaacaattt actgaccatt
301 actctgcctt actgattcaa attctgaggt ttattgaata atttcttaga ttgccttcca
361 gctctaaatt tctcagcacc aaaatgaagt ccatttcaat ctctctctct ctctttccct
421 cccgtacata tacacacact catacatata tatggtcaca atagaaaggc aggtagatca
481 gaagtctcag ttgctgagaa agagggaggg agggtgagcc agagtacttc tcccccattg
541 tagagaaaag tgaagttctt ttagagcccc gttacatctt caaggccttt tatgagataa
601 tggaggaaat aaagagggct cagtccttct accgtccata tttcattctc aaatctgtta
661 ttagaggaat gattctgatc tccacctacc atacacatgc cctgttgctt gttgggcctt
721 acactaaaat gttagagtat gatgacagat ggagttgtct gggtacattt gtgtgcattt
781 aagggtgata gtgtatttgc tctttaagag ctgagtgttt gagcctctgt ttgtgtgtaa
841 ttgagtgtgc atgtgtggga gtgaaattgt ggaatgtgta tgctcatagc actgagtgaa
901 aataaaagat tgtataaatc gtggggcatg tggaattgtg tgtgcctgtg cgtgtgcagt
961 attttttttt ttttaagtaa gccactttag atcttgtcac ctcccctgtc ttctgtgatt
1021 gattttgcga ggctaatggt gcgtaaaagg gctggtgaga tctgggggcg cctcctagcc
1081 tgacgtcaga gagagagttt aaaacagagg gagacggttg agagcacaca agccgcttta
1141 ggagcgaggt tcggagccat cgctgctgcc tgctgatccg cgcctagagt ttgaccagcc
1201 actctccagc tcggctttcg cggcgccgag atgctgtcct gccgcctcca gtgcgcgctg
1261 gctgcgctgt ccatcgtcct ggccctgggc tgtgtcaccg gcgctccctc ggaccccaga
1321 ctccgtcagt ttctgcagaa gtccctggct gctgccgcgg ggaagcaggt aaggagactc
1381 cctcgacgtc tcccggattc tccagccctc cctaagcctt gctcctgccc cattggtttg
1441 gacgtaaggg atgctcagtc cttctaaaga gttttggtgc ttttctgggt ccctcagctc
Nucleic acid data

GenBank anatomy: sequence types


• Genomic DNA
• Genomic RNA (from RNA viruses)
• Precursor RNA
• mRNA (cDNA)
• Ribosomal RNA (in ribosomes)
• Transfer RNA
• Small nuclear RNA (associated with RNA splicing)
• Small cytoplasmic RNA
• MicroRNA…
Nucleic acid data

GenBank: reliability
GenBank is highly redundant GenBank may contain errors

• There are 27,702,323 human • Entries are made by researchers.


entries in GenBank. • Very few researchers update their
• But there are only ~22,000 protein entries in the database.
coding regions. • It can be difficult to resolve
• Why the difference? conflicts between different
entries.
Nucleic acid data

Entries Bases Species


8849611 263280740770 Severe acute respiratory syndrome coronavirus 2

GenBank: 1943950
111961
255975379562
205666444201
Triticum aestivum
Hordeum vulgare

diversity
1347585 126087260773 Hordeum vulgare subsp. vulgare
520 106587373982 Hordeum bulbosum
164 93011095388 Viscum album
29876 92980158773 Hordeum vulgare subsp. spontaneum
10049258 43637339408 Mus musculus
27810338 36825525087 Homo sapiens
Contains lots of 175081 24021381303 Escherichia coli
20 most
information for a 29811 21128005736 Avena sativa
sequenced
2640663 20263158983 Arabidopsis thaliana
small number of 33665 16333452591 Klebsiella pneumoniae organisms
species 2243780 16210185539 Bos taurus
1732296 13758456861 Danio rerio
312035 13104402122 Arachis hypogaea
Note many 195 11554711366 Sambucus nigra
28766 11286173222 Vicia faba
plants…why? 14812 10342955730 Triticum monococcum
23130 9981582961 Triticum turgidum subsp. durum

https://www.ncbi.nlm.nih.gov/genbank/release/current/
Nucleic acid data

Accessing sequence databases


• Searching the header
• Searching the annotations for keywords (organism, gene name, etc.)

• Searching the sequences


• Searching for sequences similar to a query sequence using programs
such as BLAST
• Searching for sequences containing particular patterns
Protein sequence data

Protein sequence database: UniProt


• Unified protein database incorporating multiple protein databases
• Collaboration between Swiss Institute for Bioinformatics, European Bioinformatics
Institute (UK) and Protein Information Resource (USA)

http://www.uniprot.org
Protein sequence data

UniProt: components
• UniProt Knowledgebase (UniProtKB)
• central access point for extensive curated protein information, including
function, classification, and cross-reference
• UniProt Non-redundant Reference (UniRef)
• combines closely related sequences into a single record to speed
searches
• UniProt Archive (UniParc)
• comprehensive repository, reflecting the history of all protein
sequences.
Protein sequence data
Protein sequence data

UniProt/Swiss-Prot: statistics
Release 2024_03 (May 2024)

571,609 sequence entries


curated from 299,621
unique references and
comprising 206,878,625
amino acids

https://web.expasy.org/docs/relnotes/relstat.html
Protein sequence data

UniProt/TrEMBL: statistics
Release 2024_03 (May 2024)

244,910,918 sequence
entries, comprising
86,585,019,224 amino
acids

https://www.ebi.ac.uk/uniprot/TrEMBLstats
Protein sequence data

UniProt/Swiss-Prot: top 20 species


Number Frequency Species
1 20435 Homo sapiens (Human)
2 17212 Mus musculus (Mouse)
3 16386 Arabidopsis thaliana (Mouse-ear cress)
4 8199 Rattus norvegicus (Rat)
5 6727 Saccharomyces cerevisiae (strain ATCC 204508 / S288c) (Baker's yeast)
6 6046 Bos taurus (Bovine)
7 5121 Schizosaccharomyces pombe (strain 972 / ATCC 24843) (Fission yeast)
8 4530 Escherichia coli (strain K12)
9 4472 Caenorhabditis elegans
10 4191 Bacillus subtilis (strain 168)
11 4187 Oryza sativa subsp. japonica (Rice)
12 4160 Dictyostelium discoideum (Social amoeba)
13 3778 Drosophila melanogaster (Fruit fly)
14 3506 Xenopus laevis (African clawed frog)
15 3332 Danio rerio (Zebrafish) (Brachydanio rerio)
16 2309 Mycobacterium tuberculosis (strain ATCC 25618 / H37Rv)
17 2309 Gallus gallus (Chicken)
18 2218 Pongo abelii (Sumatran orangutan) (Pongo pygmaeus abelii)
19 2046 Escherichia coli O157:H7
20 1899 Mycobacterium tuberculosis (strain CDC 1551 / Oshkosh)
Protein sequence data

UniProt/TrEMBL: taxonomic origins


Kingdom Eukaryota
Protein sequence data

UniProt/Swiss-Prot anatomy: header


Alpha-1-antitrypsin protein 1
Information
Code
ID ID A1AT_HUMAN STANDARD; PRT; 418 AA.
Accession AC P01009; Q13672; Q5U0M1; Q96BF9; Q96ES1; Q9P1P0;
Date DT 21-JUL-1986 (Rel. 01, Created)
DT 01-OCT-1996 (Rel. 34, Last sequence update)
DT 13-SEP-2005 (Rel. 48, Last annotation update)
Description DE Alpha-1-antitrypsin precursor (Alpha-1 protease inhibitor)
DE (Alpha-1- antiproteinase).
Gene Name GN Name=SERPINA1; Synonyms=AAT, PI; ORFNames=PRO0684, PRO2209;
Organism Species OS Homo sapiens (Human).
Classification OC Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; ;
OC Mammalia; Eutheria; Euarchontoglires; Primates; Hominidae;
OC Homo.
Cross reference OX NCBI_TaxID=9606;
Protein sequence data

UniProt/Swiss-Prot anatomy: literature


Reference number RN [1]
Reference type RP NUCLEOTIDE SEQUENCE [MRNA].
External IDs RX MEDLINE=84107980; PubMed=6319097 [NCBI, ExPASy, EBI, Israel, Japan];
Authors RA Bollen A., Herzog A., Cravador A., Herion P., Chuchana P.,
RA van der Straten A., Loriau R., Jacobs P., van Elsen A.;
Title RT "Cloning and expression in Escherichia coli of full-length
RT complementary DNA coding for human alpha 1-antitrypsin.";
Journal information RL DNA 2:255-264(1983).
...
RN [22]
RP X-RAY CRYSTALLOGRAPHY (2.9 ANGSTROMS) OF 26-418.
RX PubMed=9466920 [NCBI, ExPASy, EBI, Israel, Japan];
RA Elliott P.R., Abrahams J.P., Lomas D.A.;
RT "Wild-type alpha 1-antitrypsin is in the canonical inhibitory
RT conformation.";
RL J. Mol. Biol. 275:419-425(1998).
...
RN [49]
RP VARIANT Z-BRISTOL MET-109.
RX MEDLINE=98120621; PubMed=9459000 [NCBI, ExPASy, EBI, Israel, Japan];
RA Lovegrove J.U., Jeremiah S., Gillett G.T., Temple I.K., Povey S.;
RT "A new alpha 1-antitrypsin mutation, Thr-Met 85, (PI ZBristol)
RT associated with novel electrophoretic properties.";
RL Ann. Hum. Genet. 61:385-391(1997).
Protein sequence data

UniProt/Swiss-Prot anatomy: functional


information
CC -!- FUNCTION: Inhibitor of serine proteases. Its primary target is
CC elastase, but it also has a moderate affinity for plasmin and
CC thrombin.
CC -!- SUBCELLULAR LOCATION: Secreted.
CC -!- TISSUE SPECIFICITY: Plasma.
CC -!- DOMAIN: The reactive center loop (RCL) extends out from the body
CC of the protein and directs binding to the target protease. The
CC protease cleaves the serpin at the reactive site within the RCL.
CC -!- POLYMORPHISM: The sequence shown is that of the M1V allele which
CC is the most common form of PI (44 to 49%). Other frequent alleles
CC are: M1A 20 to 23%; M2 10 to 11%; M3 14 to 19%.
CC -!- DISEASE: The major physiological function of AAT is the protection
CC of the lower respiratory tract against proteolytic destruction by
CC human leukocyte elastase (HLE).

This information summarises data from the literature.


This knowledge is of enormous scientific value.
NOTE: the information may not be completely up-to-date. Why?
Protein sequence data

UniProt/Swiss-Prot anatomy: references


and links
DNA sequence DR EMBL; K01396; AAB59375.1; -; mRNA.
DR EMBL; K02212; AAB59495.1; -; Genomic_DNA

Protein sequence DR PIR; A21853; ITHU.

Protein structure DR PDB; 1ATU; X-ray; @=45-418.


DR PDB; 1D5S; X-ray; A=44-377, B=378-418.

2-D PAGE DR SWISS-2DPAGE; P01009; HUMAN.


DR HSC-2DPAGE; P01009; HUMAN.

Domains DR InterPro; IPR000215; Prot_inh_serpin.


DR InterPro; Graphical view of domains.
DR Pfam; PF00079; Serpin; 1.
DR Pfam; Graphical view of domain structure.

Keywords KW 3D-structure; Acute phase; Sequencing;


KW Serine protease inhibitor; Serpin.
Protein sequence data

UniProt/Swiss-Prot anatomy: features


Secretion signal FT SIGNAL 1 24
FT CHAIN 25 418 Alpha-1-antitrypsin.

Protein modifications FT CARBOHYD 70 70 N-linked (GlcNAc...).


FT CARBOHYD 107 107 N-linked (GlcNAc...).
FT CARBOHYD 271 271 N-linked (GlcNAc...).

Polymorphisms FT VARIANT 4 4 S -> L (in Z-Wrexham).


FT /FTId=VAR_006978.

Known errors FT CONFLICT 12 12 Missing (in Ref. 4).

Structure FT TURN 49 50
FT HELIX 51 68
FT STRAND 74 76
FT HELIX 78 89
FT TURN 90 91
Protein sequence data

UniProt/Swiss-Prot anatomy: sequence

Length Molecular weight Checksum (64-bit cyclic redundancy)


SQ SEQUENCE 418 AA; 46737 MW; 7016555F273B7F16 CRC64;
MPSSVSWGIL LLAGLCCLVP VSLAEDPQGD AAQKTDTSHH DQDHPTFNKI TPNLAEFAFS
LYRQLAHQSN STNIFFSPVS IATAFAMLSL GTKADTHDEI LEGLNFNLTE IPEAQIHEGF
Amino acid QELLRTLNQP DSQLQLTTGN GLFLSEGLKL VDKFLEDVKK LYHSEAFTVN FGDTEEAKKQ
sequence INDYVEKGTQ GKIVDLVKEL DRDTVFALVN YIFFKGKWER PFEVKDTEEE DFHVDQVTTV
KVPMMKRLGM FNIQHCKKLS SWVLLMKYLG NATAIFFLPD EGKLQHLENE LTHDIITKFL
ENEDRRSASL HLPKLSITGT YDLKSVLGQL GITKVFSNGA DLSGVTEEAP LKLSKAVHKA
VLTIDEKGTE AAGAMFLEAI PMSIPPEVKF NKPFVFLMIE QNTKSPLFMG KVVNPTQK
Specialized databases

Specialized sequence databases


• Focus on a specific type of sequences
• Sequences are often modified or specially annotated
• Usage depends on the database
• Examples:
• Non-coding RNA databases
• Immunology databases
(e.g., ImMunoGeneTics IMGT)
• Cancer databases
(e.g., The Cancer Genome Atlas TCGA)
https://portal.gdc.cancer.gov/
https://rnacentral.org/expert-databases
Specialized databases

20,000 primary cancer and 76,156 genomes from ~9


matched normal samples ancestries/populations
spanning 33 cancer types

https://portal.gdc.cancer.gov/ https://gnomad.broadinstitute.org/
Specialized databases

Using the right words: ontologies


MOLECULAR FUNCTION The Gene Ontology (GO)

Nucleic acid binding enzyme

DNA binding helicase Adenosine triphophatase

Chromatin binding DNA helicase ATP-dependant DNA-dependant


helicase Adenosine triphosphatase

ATP-dependant DNA helicase

http://geneontology.org/docs/introduction-to-go
Programmatic access

Programmatic access to databases:


RESTful APIs
• REST: Representational State Transfer
• API: Application Programming Interface
• Many databases are designed so that their content can be accessed in a
consistent way, to allow automation and scripting
• Note that the syntax is consistent within one database but will differ
between databases
• (Broadly) access:
• BaseURL: where the content sits
• Query: search phrase/syntax used to extract the information
• Fields: which part of the database/table to search within
• Format: how you want the output
Programmatic access

Demo: UniProt
Guide: http://www.uniprot.org/help/programmatic_access
BaseURL: https://rest.uniprot.org/uniprotkb/
Examples:
Web browser:
https://rest.uniprot.org/uniprotkb/search?query=reviewed:true+AND+
organism_id:9606&format=tsv
Terminal:
Format
curl -H "Accept: text/plain; format=tsv"
"https://rest.uniprot.org/uniprotkb/search?query=reviewed:true+AND+organism_id:9606"
BaseURL Query Fields
Programmatic access

Other useful APIs


• Gene Ontology:
• Guide: http://geneontology.org/docs/tools-guide/
• BaseURL: https://api.geneontology.org/api
• Example:
http://api.geneontology.org/api/bioentity/function/GO:0006915
• NCBI (E-utils)
• Guide: https://www.ncbi.nlm.nih.gov/books/NBK25501/
• BaseURL: https://eutils.ncbi.nlm.nih.gov/entrez/eutils/
• Example:
https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubme
d&term=science[journal]+AND+breast+cancer+AND+2008[pdat]
Programmatic access

Programmatic access: local download


• Common flat file formats
• Tabular (TSV or CSV, or some other delimiter)
• OBO (Open Biomedical Ontologies) biology-oriented language for building
ontologies based on OWL
• OWL (Web Ontology Language)
• JSON (JavaScript Object Notation)
• XML (Extensible Markup Language)
• Relational formats
• SQL or SQLite
• Download either from an FTP site or an API site
• Access through FTP or tools like ASPERA
• Search, parse or query local versions
File formats

OBO vs OWL: example GO:0003723


File formats

JSON: example GO:0003723


Further reading/resources
• http://www.ncbi.nlm.nih.gov/Genbank/index.html
• https://www.ncbi.nlm.nih.gov/genbank/release/current/
• https://web.expasy.org/docs/relnotes/relstat.html
• https://www.ebi.ac.uk/uniprot/TrEMBLstats
• http://geneontology.org
• https://www.ncbi.nlm.nih.gov/books/NBK25501/
• http://www.uniprot.org
• https://portal.gdc.cancer.gov/
• https://gnomad.broadinstitute.org/
• https://rnacentral.org/expert-databases

You might also like