Lecture 4 Nucleic Acid Sequence Database
Lecture 4 Nucleic Acid Sequence Database
Databases
• GenBank, EMBL and DDBJ are the
three primary nucleotide sequence
databases.
• www.ncbi.nlm.nih.gov/Genbank/
GENBANK
• An annotated collection of all
publicly available nucleotide and
protein sequences
• http://www.ebi.ac.uk/embl.html
• An annotated collection of all
publicly available nucleotide and
protein sequences
• http://www.ddbj.nig.ac.jp
• UniGene www.ncbi.nlm.nih.gov/UniGene/
• The UniGene system attempts to process the GenBank
sequence data into a non-redundant set of gene-oriented
clusters.
• SGD genome-www.stanford.edu/Saccharomyces/
• The Saccharomyces Genome Database (SGD) is a
OTHER scientific database of the molecular biology and genetics
of the yeast Saccharomyces cerevisiae.
NUCLEOTIDE •
•
EBI Genomes www.ebi.ac.uk/genomes/
This web site provides access and statistics for the
completed genomes, and information about ongoing
DATABASES projects.
• Genome Biology www.ncbi.nlm.nih.gov/Genomes/
• The Genome Biology site at NCBI contains information
about the available complete genomes.
• Ensembl www.ensembl.org
• Ensembl is a joint project between EMBL-EBI and the
Sanger Centre to develop a software system which
produces and maintains automatic annotation of
eukaryotic genomes.
Protein Information Resource(PIR)
UNIPROT
databases.
– Uniprot Reference Clusters (UniRef):
removing sequence redundancy by
PFAM IS A DATABASE OF CURATED PROTEIN FAMILIES, IN PFAM, THE PROFILE HMM IS SEARCHED AGAINST A
EACH OF WHICH IS DEFINED BY TWO ALIGNMENTS AND A LARGE SEQUENCE COLLECTION, BASED ON UNIPROT
PROFILE HIDDEN MARKOV MODEL (HMM). KNOWLEDGEBASE (UNIPROTKB), TO FIND ALL INSTANCES
OF THE FAMILY.
PROSITE DATABASE