0% found this document useful (0 votes)
2 views

15 Article

KEGG (Kyoto Encyclopedia of Genes and Genomes) is an integrated database resource designed for the biological interpretation of genome sequences and high-throughput data, linking genes to cellular functions through various databases such as KEGG PATHWAY and KEGG GENES. Recent improvements include the addition of viruses and plasmids to the GENES database, the introduction of new automatic annotation servers, and enhanced quality control for the KEGG Orthology database. KEGG serves as a critical resource for researchers in bioinformatics, particularly in the fields of antimicrobial resistance and drug interactions.

Uploaded by

KBG Rocker
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

15 Article

KEGG (Kyoto Encyclopedia of Genes and Genomes) is an integrated database resource designed for the biological interpretation of genome sequences and high-throughput data, linking genes to cellular functions through various databases such as KEGG PATHWAY and KEGG GENES. Recent improvements include the addition of viruses and plasmids to the GENES database, the introduction of new automatic annotation servers, and enhanced quality control for the KEGG Orthology database. KEGG serves as a critical resource for researchers in bioinformatics, particularly in the fields of antimicrobial resistance and drug interactions.

Uploaded by

KBG Rocker
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Published online 17 October 2015 Nucleic Acids Research, 2016, Vol.

44, Database issue D457–D462


doi: 10.1093/nar/gkv1070

KEGG as a reference resource for gene and protein


annotation
Minoru Kanehisa1,* , Yoko Sato2 , Masayuki Kawashima2 , Miho Furumichi1 and Mao Tanabe1
1
Institute for Chemical Research, Kyoto University, Uji, Kyoto 611-0011, Japan and 2 Healthcare Solutions
Department, Fujitsu Kyushu Systems Ltd., Hakata-ku, Fukuoka 812-0007, Japan

Received September 15, 2015; Revised October 04, 2015; Accepted October 05, 2015

ABSTRACT the Japanese Human Genome Program, in anticipation of


the need for a reference knowledge base for biological inter-
KEGG (http://www.kegg.jp/ or http://www.genome.jp/ pretation of genome sequence data (1). The main objective
kegg/) is an integrated database resource for biolog- of KEGG has been to establish links from collective sets
ical interpretation of genome sequences and other of genes in the genome to high-level functions of the cell
high-throughput data. Molecular functions of genes and the organism. We have developed, among others, the
and proteins are associated with ortholog groups KEGG PATHWAY database as a representation of high-
and stored in the KEGG Orthology (KO) database. level functions, the KEGG GENES database as a collection
The KEGG pathway maps, BRITE hierarchies and of completely sequenced genomes, and the KO (KEGG Or-
KEGG modules are developed as networks of KO thology) database for linking genes to high-level functions.
nodes, representing high-level functions of the cell With the arrival of high-throughput biology KEGG has be-
and the organism. Currently, more than 4000 com- come one of the most widely used biological databases in the
world.
plete genomes are annotated with KOs in the KEGG
Genome annotation in KEGG is done differently from
GENES database, which can be used as a refer- most other databases. First, molecular functions are stored
ence data set for KO assignment and subsequent in the KO database and associated with ortholog groups in
reconstruction of KEGG pathways and other molec- order to enable extension of experimental evidence in a spe-
ular networks. As an annotation resource, the fol- cific organism to other organisms. Annotation of individ-
lowing improvements have been made. First, each ual genes in the GENES database is simply to create links
KO record is re-examined and associated with pro- to the KO database by assigning KO entry identifiers called
tein sequence data used in experiments of func- K numbers. No updates are made to original data, such as
tional characterization. Second, the GENES database gene names and descriptions in the GENES database, even
now includes viruses, plasmids, and the addendum if they are inconsistent with the KO assignment. Second, or-
category for functionally characterized proteins that tholog groups are defined in the context of KEGG pathway
maps and other molecular networks, which are all created as
are not represented in complete genomes. Third,
networks of K number nodes. Thus, the genome annotation
new automatic annotation servers, BlastKOALA and procedure to convert a gene set in the genome to a K num-
GhostKOALA, are made available utilizing the non- ber set leads to automatic reconstruction of KEGG path-
redundant pangenome data set generated from the ways and other networks, enabling interpretation of high-
GENES database. As a resource for translational level functions. Obviously, the quality of the KO database
bioinformatics, various data sets are created for an- is critical in this procedure. Over the last two years, major
timicrobial resistance and drug interaction networks. efforts have been made to improve its quality.
In early 2015, we decided to remove the restriction of
complete genomes in the KEGG GENES database. We first
INTRODUCTION added the categories of viruses and plasmids, which are
Thanks to the advancement of sequencing technologies, it important in the analysis of metagenomes and antimicro-
is now a routine task to determine the genome sequence bial resistance, respectively, as described below. We then in-
of an organism or an environmental sample that contains troduced the addendum category where, for the first time,
multiple organisms. However, it still remains a challenging we started collecting protein sequence data from published
task to fully understand biological meanings encoded in the literature rather than just importing complete genome se-
genome. In 1995, we initiated the KEGG (Kyoto Encyclo- quences from RefSeq or GenBank. This is necessary be-
pedia of Genes and Genomes) database project as part of cause a pathway map created from literature information

* To whom correspondence should be addressed. Tel: +81 774 38 4521; Fax: +81 774 38 3269; Email: [email protected]

C The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by-nc/4.0/), which
permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact
[email protected]
D458 Nucleic Acids Research, 2016, Vol. 44, Database issue

sometimes contains genes and proteins from organisms Experimental evidence for each KO
whose genome sequences are not known and would never
The development of the KO database is tightly coupled with
be known. By expanding this addendum category it is now
the development of KEGG molecular networks including
possible to capture all knowledge about gene/protein func-
KEGG pathway maps, BRITE functional hierarchies and
tions that can be associated with sequence data.
KEGG modules. Ideally a KO represents a single sequence
similarity group with an appropriate level of similarity. In
reality, there are a number of complications. A single KO
OVERVIEW AND NEW DEVELOPMENTS
may consist of multiple sequence similarity groups. A small
Overview of KEGG group with a high similarity threshold is a subset of a larger
group with a lower similarity threshold, in which case two
KEGG is an integrated database resource consisting of 16
KOs are defined as the small group and the large group ex-
main databases, which are categorized into systems, ge-
cluding the small group part. As long as the constituent
nomic, chemical and health information as shown in Ta-
sequence similarity groups are well defined including these
ble 1. The PATHWAY, BRITE and MODULE databases in
examples, the KOALA (KEGG Orthology and Links An-
the systems information category contain KEGG pathway
notation) program (1) to computationally assign K num-
maps, BRITE hierarchy and table files and KEGG mod-
bers works well. However, there are still a small number
ules, respectively, as representations of high-level functions.
of legacy KOs converted from Enzyme Commission (EC)
They are all manually created based on published literature.
number groups, whose associated sequence data are not well
The BRITE table file is a newly introduced representation,
defined.
which can be compared with the multi-column BRITE hi-
Internally KO grouping is constantly updated in the man-
erarchy file. When the data size is not large it is much easier
ual verification part of the KOALA annotation procedure
to capture the overall relationship in a tabular form with a
(1). For outside users the basis of KO grouping and its corre-
few columns optionally used for representation of hierarchy.
spondence to molecular function should be made clear by
BRITE table files are mainly used for drug classifications
experimental evidence. Thus, major efforts have been ini-
and for presenting various relationships involving diseases
tiated to annotate individual KOs with reference informa-
and drugs.
tion reporting experiments on functional characterization
The genomic information category contains the
of genes and proteins and, whenever possible, protein se-
GENOME and GENES databases for collections of
quence data used in the experiments, such as those sub-
organisms with complete genomes and their gene catalogs,
mitted to the INSDC (DDBJ/ENA/GenBank) database or
which are mostly taken from RefSeq (2) and GenBank
those stated in the reference. As of September 2015, refer-
(3) databases. As mentioned, the GENES database now
ences (PubMed links) and sequence data (GENES links) are
contains additional gene sets not related to complete
included in 76% and 45%, respectively, of about 19 000 KO
genomes. There are also other databases not listed in Table
entries. The sequence data listed in the KO entry can now be
1: computationally generated sequence similarity database
considered as the core sequence(s) from which an ortholog
SSDB and auxiliary gene catalog databases DGENES
group has been defined.
and MGENES for draft genomes and metagenomes,
respectively. The KO database containing ortholog groups
associated with molecular functions is a hub for linking
New additions to GENES database
genomic information to systems information through the
KEGG mapping procedure and also to chemical infor- For many years the KEGG GENES database was created
mation through the dual aspect of the metabolic network from NCBI’s RefSeq database. Since mid-2014, newly se-
(4). quenced prokaryotic genomes are taken from GenBank,
The COMPOUND, GLYCAN, REACTION, RPAIR, and since mid-2015, existing prokaryotic genomes, exclud-
RCLASS and ENZYME databases in the chemical infor- ing the NCBI reference genomes, are updated using Gen-
mation category contain chemical substances and reactions Bank, for the current RefSeq entries produced by the NCBI
and are collectively called KEGG LIGAND for histori- Prokaryotic Genome Annotation Pipeline (6) are very dif-
cal reasons. The ENZYME database originates from the ferent from previous versions. No changes have been made
database of Enzyme Nomenclature (5). There is also a small to eukaryotic genomes. The data source of KEGG GENES
data set of reaction modules (1,4), which can be used for an- is summarized in Table 2.
notation of enzyme genes. Eukaryotes and prokaryotes with complete genomes con-
The health information category consists of the DIS- stitute KEGG organisms identified by three- or four-letter
EASE, DRUG, DGROUP and ENVIRON databases for organism codes. As shown in this table, there are three ad-
disease and drug information. DGROUP is a newly added ditional categories, viruses, plasmids and addendum, with
database, which is being developed for grouping function- two-letter codes of vg, pg and ag, respectively. The viruses
ally identical or similar drugs in the drug interaction net- and plasmids categories are taken from RefSeq collections.
works. KEGG MEDICUS is an interface for the general The annotation (K-number assignment) rate is very low for
public integrating these internally developed databases with viruses, about 7% compared to 46% for KEGG organisms,
drug labels (package inserts) of all marketed drugs in Japan but this category is useful in metagenome annotation. Many
and the USA. The Japanese version of KEGG MEDICUS plasmids are included in the complete genomes of KEGG
is especially advanced in this integration, and heavily ac- organisms, and the remaining ones are selected and stored
cessed mostly through web search engines. in the plasmids category.
Nucleic Acids Research, 2016, Vol. 44, Database issue D459

Table 1. The KEGG resource including drug labels


Category Database name Content
Systems Information KEGG PATHWAY KEGG pathway maps
KEGG BRITE BRITE functional hierarchies and BRITE tables
KEGG MODULE KEGG modules
Genomic Information KEGG ORTHOLOGY KEGG Orthology (KO) groups
KEGG GENOME KEGG organisms (complete genomes)
KEGG GENES Gene catalogs of KEGG organisms, viruses,
plasmids and addendum category
Chemical Information (KEGG LIGAND) KEGG COMPOUND Metabolites and other small molecules
KEGG GLYCAN Glycans
KEGG REACTION Biochemical reactions
KEGG RPAIR Reactant pairs
KEGG RCLASS Reaction class
KEGG ENZYME Enzyme nomenclature
Health Information (KEGG MEDICUS) KEGG DISEASE Human diseases
KEGG DRUG Drugs
KEGG DGROUP Drug groups
KEGG ENVIRON Crude drugs and health-related substances
JAPICa Drug labels in Japan
DailyMedb Drug labels in the USA (links only)
a http://www.japic.or.jp/
b http://dailymed.nlm.nih.gov/

Table 2. Data Source of KEGG GENES


Category Primary data source Genome identifier Gene identifier
Eukaryotes RefSeq RefSeq releasea (complete) T0 numbers (three or four letter GeneID
organism codes)
Prokaryotes RefSeq NCBI reference genomesb Locus tag
GenBank Other complete genomes listed in Locus tag
prokaryotes.txtc
Viruses RefSeq RefSeq releasea (viral) T40000 (vg) GeneID
Plasmids RefSeq RefSeq releasea (plasmid) T20000 (pg) GeneID
Addendum PubMed Functionally characterized genes T10000 (ag) ProteinID
a ftp://ftp.ncbi.nlm.nih.gov/refseq/release/
b http://www.ncbi.nlm.nih.gov/genome/browse/reference/
c ftp://ftp.ncbi.nlm.nih.gov/genomes/GENOME REPORTS/prokaryotes.txt

The addendum category is a collection of manually cre- reference and representative genomes in RefSeq (6) and ref-
ated protein sequence entries. In the KEGG pathway maps, erence proteomes in UniProt (7), which include both well-
there used to be cases where no corresponding genes could studied organisms and taxonomic diversity. In KEGG, such
be found in KEGG organisms, thus, only links to UniProt reference genomes are not explicitly defined, but the order-
(7) were given. In order to associate them with sequence ing of KEGG organisms contains preference of genomes.
data and K numbers, addendum entries are created using The KEGG organisms ordering is consistent with that of
the original sequence data with International Nucleotide the NCBI taxonomy (8), but members in each taxonomic
Sequence Database Collaboration (INSDC) protein acces- rank are manually ordered, not alphabetically ordered. The
sion numbers. In addition, there are two focused areas first genome in each taxonomic rank is considered as a ref-
where sequence records are being created. One is Enzyme erence genome, which is used for generating the following
Nomenclature. As we do for each KO entry, we believe that pangenome data sets.
each EC number entry should be linked to the sequence data As an organism’s functional capacity is represented by
used in the original experiment, so that the sequence simi- the set of assigned K numbers (KO content), the functional
larity based extension of EC number assignment can safely capacity of an organism group is represented by the com-
be done. Thus, we are trying to create a list of protein se- bined set of assigned K numbers. A pangenome data set,
quences from the reference list of the Enzyme Nomencla- as we define it here, is created by removing similar organ-
ture database (5). Another focused area is antimicrobial re- isms, but retaining the KO content, at the species, genus or
sistance (AMR), which will be discussed later. family level. When multiple members are present in each
species/genus/family group, the first genome in the KEGG
Taxonomy and pangenomes organisms order is taken as a representative genome. When
the other members in the group contain different K num-
The increasing number of sequenced genomes, especially bers that are not present in the representative genome, those
those of closely related bacterial strains, poses problems genes are added as if they are present in additional chromo-
of how to process and represent them in the database. At- somes or plasmids.
tempts are made to define selected sets of genomes, such as
D460 Nucleic Acids Research, 2016, Vol. 44, Database issue

BlastKOALA and GhostKOALA strains, and in Gram-negative bacteria this is mainly due to
mutations of beta-lactamase genes. There have been efforts
By the genome annotation procedure in KEGG, the
to collect and classify beta-lactamase mutations (12). We ex-
GENES database becomes structured in terms of the
amined about 1200 sequences and concluded that they can
KO (K number) groups. This facilitates the processing
be represented by finely classified KOs, named tight KOs,
of sequence similarity search results against the GENES
because clear phylogenetic relationships exist for groups of
database, which is simply to assign the most appropriate K
mutated genes. Signature KOs are tight KOs that can be
numbers, as implemented in the automatic annotation ser-
linked to phenotypic features, in this case resistant drug
vices of KAAS (9) and newly released BlastKOALA and
groups. The addendum category of the GENES database
GhostKOALA. As shown in Table 3, BlastKOALA and
now contains beta-lactamase sequences, as well as protein
GhostKOALA utilize the pangenome data set, which can
sequences of tetracycline, aminoglycoside and macrolide
be viewed as a non-redundant GENES database after re-
resistance genes. Figure 1 shows taxonomic distributions
moving similar sequences in similar organisms, but retain-
of signature KOs for beta-lactamases that are linked to
ing the KO content and the taxonomic diversity. The re-
carbapenem resistance, according to the current GENES
duced database size was 55% and 24% for prokaryotes at
database. A tool called Pathogen Checker is being devel-
the species and genus levels, respectively, and 81% and 59%
oped as a specialized version of the BlastKOALA server
for eukaryotes at the genus and family levels, respectively,
for comparing a query pathogen genome against a subset
as of this writing.
of the GENES database containing sequences of signature
BlastKOALA is suitable for annotating fully sequenced
KOs and signature modules.
genomes, while GhostKOALA, which uses GHOSTX (10)
and runs 100 times faster, is suitable for annotating large
data sets such as metagenomes. Both assign K numbers to
query amino acid sequences and allow KEGG mapping Drug interaction network
for interpretation of high-level functions. In BlastKOALA
The KO database is our attempt to make limited experimen-
most appropriate K numbers are determined by a method
tal evidence applicable to many other data. Genes and pro-
similar to the KOALA program internally used for annota-
teins in the GENES database are considered as instances
tion of KEGG organisms (1). In GhostKOALA only the
of functional orthologs represented by KOs. By organizing
top scores are examined for K number assignment. One
knowledge in terms of generalized (KO-based) networks,
additional feature of GhostKOALA is the assignment of
high-level functions of individual organisms can be inferred
taxonomic compositions. For this purpose the pangenome
from gene sets in the genome. As shown in Table 5, there are
data set for GhostKOALA is supplemented by sequences
two other network types that are organized in a similar way.
selected from CD-HIT clusters (11), adding sequences with-
One is the chemical reaction network. Enzymatic reactions
out K numbers in each taxonomic rank and viral sequences,
in the REACTION database are grouped into reaction class
thus representing the sequence diversity of the GENES
(RC) in the RCLASS database, representing the same lo-
database.
cal structure transformation patterns for substrate-product
pairs irrespective of overall structures (13). As previously
TRANSLATIONAL BIOINFORMATICS reported (1,13) one-to-many relationships between reaction
modules (ordered sets of RCs) and KEGG modules (sets of
Antimicrobial resistance (AMR)
KOs) may help to annotate enzyme genes.
AMR is a universal problem in the management of infec- The other is the drug interaction network, which is gen-
tious diseases and complications. Traditionally, the KEGG eralized using the newly introduced drug groups (DGs) in
database contains various contents for infectious diseases the DGROUP database. There are multiple levels of drug
and antimicrobial drugs, including KEGG disease path- groups, the lowest level being the chemical group for the
way maps for infectious diseases, KEGG metabolic path- same active ingredient with different salts or hydrates. Many
way maps for biosynthesis of antibiotics, KEGG drug struc- drug interactions are caused by overlapping targets and me-
ture maps for the history of antimicrobial drug develop- tabolizing enzymes (14), and appropriate drug groups have
ment and KEGG DRUG entries for all drugs currently in been defined. The drug interaction data set in the KEGG
use. Knowledge on AMR mechanisms is now organized in DRUG database, which is based on known interactions
KEGG pathway maps and KEGG modules (Table 4). Fur- listed in the drug labels of all marketed drugs in Japan, is
thermore, to meet the practical needs for combating AMR, being expanded with the DG representation. This will al-
we have started developing signature modules and signature low better detection of drug interactions associated with
KOs that can be used to characterize AMR from pathogen contraindications and precautions, as well as duplicate ad-
genome sequences. Signature modules are a class of KEGG ministration of drugs with the same or similar efficacy. Cur-
modules, which can be used for linking units of genes in the rently, the interaction is defined simply by the pair of D
genome, represented by sets of K numbers, to phenotypic numbers (DRUG identifiers) or DG numbers (DGROUP
features. Signature modules of drug resistance in pathogens identifiers). Attempts will be made to incorporate addi-
are treated separately with annotation of threat levels de- tional factors in the human genome, such as polymorphism
fined by CDC (Table 4). of cytochrome P450 (CYP) enzymes and mutation of spe-
There are also cases where mutations of a single gene play cific genes, for defining interaction units (Table 5), which
direct roles for AMR. Beta-lactams, the major class of an- may be used for interpretation of drug responses and drug
tibiotics, have a long history of newly appearing resistant interactions from personal genomes.
Nucleic Acids Research, 2016, Vol. 44, Database issue D461

Table 3. BlastKOALA and GhostKOALA for genome and metagenome annotation


Program KOALA BlastKOALA GhostKOALA
URL www.kegg.jp/blastkoala/ www.kegg.jp/ghostkoala/
Purpose Internal GENES annotation Genome annotation Metagenome annotation
Search program SSEARCH BLASTP GHOSTX
Scoring Weighted sum of SW scores Weighted sum of BLAST bit scores Unweighted sum of
(KOALAa ) (Modified KOALA) GHOSTX scores
Database All KEGG organisms Pangenomes Pangenomes + Viruses
a KOALA scoring includes: SW (Smith-Waterman) score, best-best flag, overlap of alignment, ratio of query and DB sequences, taxonomic category and
Pfam domains.

Table 4. KEGG contents for antimicrobial resistance


Category Content Example
Pathway map Drug resistance pathway map01501 beta-Lactam resistance
map01502 Vancomycin resistance
Signature modulea Resistance caused by: (i) altered target site M00625 Methicillin resistance
(ii) enzymatic inactivation M00627 beta-Lactam resistance, Bla system
(iii) decreased penetration M00745 Imipenem resistance, repression of porin OprD
(iv) increased efflux M00704 Tetracycline resistance, efflux pump Tet38
Signature KOa Resistance gene mutation groups Tight KOs for beta-lactamases
Sequence data Amino acid sequences of known resistance genes GENES addendum data listed in BRITE table files:
br08453 beta-lactamases
br08456 tetracyclin resistance genes
br08454 aminoglycoside resistance genes
br08455 macrolide resistance genes
a Signaturemodules and signature KOs are annotated with the threat level defined by CDC
(http://www.cdc.gov/drugresistance/threat-report-2013/) and shown in the BRITE table file
(http://www.kegg.jp/kegg/disease/br08451.html).

Figure 1. Eight signature KOs for beta-lactamases that represent carbapenem resistance are shown indicating which organisms at the genus level contain
which genes. This table is generated by the Module Table interface in the KEGG Annotation page (http://www.kegg.jp/kegg/annotation/). The K numbers
correspond to the following gene groups: K18768 (KPC), K18793 (OXA-23), K18971 (OXA-24), K18976 (OXA-48), K18794 (OXA-51), K18972 (OXA-
58), K19211 (OXA-62) and K18780 (NDM).

Table 5. Three types of molecular networks in KEGG


Type Instance (Database) Class (Database) Abbreviation Functional unit
Gene/protein network Gene/protein (KEGG Ortholog (KEGG KO KEGG module
GENES) ORTHOLOGY)
Chemical reaction network Reaction (KEGG Reaction class (KEGG RC Reaction module
REACTION) RCLASS)
Drug interaction network Drug (KEGG DRUG) Drug group (KEGG DG Interaction unit
DGROUP)
D462 Nucleic Acids Research, 2016, Vol. 44, Database issue

Accessing KEGG 3. Benson,D.A., Clark,K., Karsch-Mizrachi,I., Lipman,D.J., Ostell,J.


and Sayers,E.W. (2015) GenBank. Nucleic Acids Res., 43, D30–D35.
KEGG is made available at both the KEGG main 4. Kanehisa,M. (2013) Chemical and genomic evolution of
website (http://www.kegg.jp/) and the GenomeNet mirror enzyme-catalyzed reaction networks. FEBS Lett., 587, 2731–2737.
website (http://www.genome.jp/kegg/). BlastKOALA and 5. McDonald,A.G. and Tipton,K.F. (2014) Fifty-five years of enzyme
classification: advances and difficulties. FEBS J., 281, 583–592.
GhostKOALA are maintained in the main website, while 6. Tatusova,T., Ciufo,S., Federhen,S., Fedorov,B., McVeigh,R.,
KAAS, SIMCOMP and many other tools are maintained O’Neill,K., Tolstoy,I. and Zaslavsky,L. (2015) Update on RefSeq
in the GenomeNet website, which also develops the LinkDB microbial genomes resources. Nucleic Acids Res., 43, D599–D605.
and MGENES databases. 7. UniProt Consortium. (2015) UniProt: a hub for protein information.
Nucleic Acids Res., 43, D204–D212.
8. Federhen,S. (2012) The NCBI Taxonomy database. Nucleic Acids
ACKNOWLEDGEMENTS Res., 40, D136–D143.
9. Moriya,Y., Itoh,M., Okuda,S., Yoshizawa,A. and Kanehisa,M.
Computational resources were provided by the Bioinfor- (2007) KAAS: an automatic genome annotation and pathway
matics Center, Institute for Chemical Research, Kyoto Uni- reconstruction server. Nucleic Acids Res., 35, W182–W185.
versity, Japan. 10. Suzuki,S., Kakuta,M., Ishida,T. and Akiyama,Y. (2014) GHOSTX:
an improved sequence homology search algorithm using a query
suffix array and a database suffix array. PLoS One, 9, e103833.
FUNDING 11. Li,W. and Godzik,A. (2006) Cd-hit: a fast program for clustering and
National Bioscience Database Center of the Japan Science comparing large sets of protein or nucleotide sequences.
Bioinformatics, 22, 1658–1659.
and Technology Agency (partial). Funding for open access 12. Bush,K. and Jacoby,G.A. (2010) Updated functional classification of
charge: National Bioscience Database Center of the Japan beta-lactamases. Antimicrob. Agents Chemother., 54, 969–976.
Science and Technology Agency. 13. Muto,A., Kotera,M., Tokimatsu,T., Nakagawa,Z., Goto,S. and
Conflict of interest statement. None declared. Kanehisa,M. (2013) Modular architecture of metabolic pathways
revealed by conserved sequences of reactions. J. Chem. Inf. Model.,
53, 613–622.
REFERENCES 14. Takarabe,M., Shigemizu,D., Kotera,M., Goto,S. and Kanehisa,M.
(2011) Network-based analysis and characterization of adverse
1. Kanehisa,M., Goto,S., Sato,Y., Kawashima,M., Furumichi,M. and
drug-drug interactions. J. Chem. Inf. Model., 51, 2977–2985.
Tanabe,M. (2014) Data, information, knowledge and principle: back
to metabolism in KEGG. Nucleic Acids Res., 42, D199–D205.
2. Pruitt,K.D., Tatusova,T., Brown,G.R. and Maglott,D.R. (2012)
NCBI Reference Sequences (RefSeq): current status, new features and
genome annotation policy. Nucleic Acids Res., 40, D130–D135.

You might also like