Class05- Molecular sequence database - 2022
Class05- Molecular sequence database - 2022
SIV 2001
Class 5
Molecular sequence database
1
2/11/2022
GenBank
• Public nucleotide
sequence database
• Development of
software tools for
sequence analysis
2
2/11/2022
Organisms in GenBank
3
2/11/2022
ACCESSION
VERSION
cds.
AF115338
AF115338.1 GI:4959391
GenBank Flat File
KEYWORDS .
SOURCE Pseudomonas fluorescens.
ORGANISM Pseudomonas fluorescens
•Title
REFERENCE
Bacteria; Proteobacteria; gamma subdivision; Pseudomonadaceae;
Pseudomonas.
1 (bases 1 to 591)
Header •Taxonomy
AUTHORS Brinkman,F.S., Schoofs,G., Hancock,R.E. and De Mot,R.
TITLE Influence of a putative ECF sigma factor on expression of the major
•Citation
outer membrane protein, OprF, in Pseudomonas aeruginosa and
Pseudomonas fluorescens
JOURNAL J. Bacteriol. 181 (16), 4746-4754 (1999)
MEDLINE 99369842
PUBMED 10438740
REFERENCE 2 (bases 1 to 591)
AUTHORS De Mot,R.
TITLE Direct Submission
JOURNAL Submitted (04-DEC-1998) F.A. Janssens Laboratory of Genetics,
Applied Plant Sciences, K. Mercierlaan 92, Heverlee B-3001, Belgium
FEATURES Location/Qualifiers
source 1..591
/organism="Pseudomonas fluorescens"
/strain="M114"
/db_xref="taxon:294"
gene 1..591
/gene="sigX"
CDS 1..591
/gene="sigX"
/codon_start=1
Features (seq)
/transl_table=11
/product="ECF sigma factor SigX"
/protein_id="AAD34329.1"
/db_xref="GI:4959392"
/translation="MNKAQTLSTRYDPRELSDEELVARSHTELFHVTRAYEELMRRYQ
RTLFNVCARYLGNDRDADDVCQEVMLKVLYGLKNLEGKSKFKTWLYSITYNECITQYR
KERRKRRLMDALSLDPLEEASEEKALQPEEKGGLDRWLVYVNPIDRGILVLRFVAELE
FQEIADIMHMGLSATKMRYKRALDKLREKFAGETET"
BASE COUNT 157 a 133 c 170 g 131 t
ORIGIN
1 atgaataaag cccaaacgct atccacgcgc tacgaccccc gcgagctctc tgatgaggag
61 ttggtcgcgc gctcgcatac cgagcttttt cacgtaacgc gcgcctatga agaactgatg
121 cggcgttacc agcgaacatt atttaacgtt tgtgcgagat atcttgggaa cgatcgcgac
181 gcagacgatg tctgtcagga agtcatgttg aaggtgctgt atggcctgaa gaacctcgag
7 gggaaatcga agttcaaaac gtggctctac agcatcacgt acaacgaatg tattacgcag
241
301 tatcggaagg aacggcgaaa gcgtcgcttg atggacgcat tgagtcttga ccccctcgag
DNA Sequence
361 gaagcgtccg aagaaaaggc gcttcaaccc gaggagaagg gcgggcttga tcgctggctg
421 gtgtatgtga acccgattga ccgtggaatt ctggtgcttc gatttgtcgc agagctggaa
4
2/11/2022
5
2/11/2022
6
2/11/2022
Genome database
• What Genomes Are Available?
• List of completed genomes increases almost every week
• GOLD Website: Listing of finished and “in progress” genomes
http://www.genomesonline.org/
7
2/11/2022
https://www.ncbi.nlm.nih.gov/genome/viruses/
8
2/11/2022
SRA database
9
2/11/2022
Protein Databases
• Protein sequence databases
• Protein properties
• Protein localization and targeting
• Protein sequence motifs and active sites
• Protein domain databases; protein classification
• Databases of individual protein families
• Protein structure database
• https://pir.georgetown.edu/
• The oldest universal curated
protein sequence database.
• Published as the ‘Atlas of Protein
Sequence and Structure’ from
1965 to 1978 by the late Margaret
O Dayhoff.
• Established in 1984 as a successor
to the original National Biomedical
Research Foundation Protein
Sequence Database.
10
2/11/2022
SWISS-PROT
• https://web.expasy.org/docs/swiss-prot_guideline.html
• Manually curated, non-redundant protein sequence database.
• Highly integrated with other databases.
TrEMBL
• TrEMBL (Translation from EMBL) database (http://www.ebi.ac.uk/trembl/)
• Automatically curated and derived from the translation of all coding
sequences in the DDBJ/EMBL/GenBank nucleotide sequence database
that are not yet included in Swiss-Prot.
• http://www.bioinfo.pte.hu/more/TrEMBL.htm
11
2/11/2022
UniProt
12
2/11/2022
FASTA
13
2/11/2022
FASTA
Accession number
• To label and identify sequence
accessible information.
• A string of 4 to 12 characters
that are associated with a
molecular sequence record.
14
2/11/2022
15
2/11/2022
16