0% found this document useful (0 votes)
103 views

Overview of Ontologies: Ontology Is Defined As A "Formal Specification of A Conceptualization."

The document provides an overview of ontologies and the Gene Ontology (GO) project. It describes ontologies as formal specifications of conceptualizations that systematically describe domains of interest. The GO aims to develop cross-species biological vocabularies for annotating genes and gene products consistently. It comprises three ontologies describing molecular function, biological processes, and cellular components that are common across living organisms. The goals are to compile comprehensive structured vocabularies and describe biological objects across databases using consistent GO terms.

Uploaded by

Raazia Mir
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
0% found this document useful (0 votes)
103 views

Overview of Ontologies: Ontology Is Defined As A "Formal Specification of A Conceptualization."

The document provides an overview of ontologies and the Gene Ontology (GO) project. It describes ontologies as formal specifications of conceptualizations that systematically describe domains of interest. The GO aims to develop cross-species biological vocabularies for annotating genes and gene products consistently. It comprises three ontologies describing molecular function, biological processes, and cellular components that are common across living organisms. The goals are to compile comprehensive structured vocabularies and describe biological objects across databases using consistent GO terms.

Uploaded by

Raazia Mir
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 25

Overview of Ontologies

There's an Endless Variety of World Views, and Almost as Many Ways to Organize and Describe Them The root of the term is the Greek ontos , or being or the nature of things and the nature of existence. Tom Gruber, among others, made the term popular in relation to computer science and artificial intelligence about 15 years ago.

Ontology is defined as a "formal specification of a conceptualization."


-a means of viewing and organizing and conceptualizing and defining a domain of interest. Ontology is a systematic account of Existence. That is, an ontology is a description (like a formal specification of a program) of the concepts and relationships that can exist for an agent or a community of agents.

What we understand by ontology


Ontology has one basic question: "What are the fundamental categories of being?" "What exists", "What is", "What am I", "What is describing this to me", all exemplify questions about being, and highlight the most basic problems in ontology: finding a subject, a relationship, and an object to talk about. Different philosophers make different lists of such fundamental categories of being. This highlights one of the problems of the philosophical approachit relies on continued investigation of categories, and has no clear

History of ontology
Theory of being as such. It was originally called first philosophy by Aristotle. In the 18th century Christian Wolff contrasted ontology, or general metaphysics, with special metaphysical theories of souls, bodies, or God, claiming that ontology could be a deductive discipline revealing the essences of things. This view was later strongly criticized by David Hume and Immanuel Kant. Ontology was revived in the early 20th century by practitioners of phenomenology and existentialism, notably Edmund Husserl and his student Martin Heidegger. In the English-speaking world, interest in ontology was renewed in the mid20th century by W.V.O. Quine; by the end of the century it had become a central discipline of analytic philosophy. idealism; realism; universal.

Schematic ontology diagram

Overview and Role of Ontologies


Of course, a fancy name is not sufficient alone to warrant an interest in ontologies. There are reasons why understanding, using and manipulating ontologies can bring practical benefit: Depending on their degree of formalism (an important dimension), ontologies help make explicit the scope, definition, and language and meaning (semantics ) of a given domain or world view Ontologies may provide the power to generalize about their domains Ontologies, if hierarchically structured in part (and not all are), can provide the power of inheritance. Ontologies provide guidance for how to correctly "place" information in relation to other information in that domain. Ontologies may provide the basis to reason or infer over its domain (again as a function of its formalism) Ontologies can provide a more effective basis for information extraction or content clustering Ontologies, again depending on their formalism, may be a source of structure and controlled vocabularies helpful for disambiguating context; they can inform and provide structure to the "lexicons" in particular domains Ontologies can provide guiding structure for browsing or discovery within a domain, and Ontologies can help relate and "place" other ontologies or world views in relation to one another; in other words, ontologies can organize ontologies from the most

Hierarchy in ontology

Common components of ontologies


Individuals: instances or objects (the basic or "ground level" objects) Classes: sets, collections, concepts or types of objects[1] Attributes: properties, features, characteristics, or parameters that objects (and classes) can have Relations: ways that classes and objects can be related to one another Function terms: complex structures formed from certain relations that can be used in place of an individual term in a statement Restrictions: formally stated descriptions of what must be true in order for some assertion to be accepted as input Rules: statements in the form of an if-then (antecedent-consequent) sentence that describe the logical inferences that can be drawn from an assertion in a particular form Axia: assertions (including rules) in a logical form that together comprise the overall theory that the ontology describes, for its domain of application Events: the changing of attributes or relations

components of ontologies - Exemplary


Classes

Relations

Attributes
Name : Ford Explorer Number-of-doors : 4 Engine : {4.0L, 4.6L} Transmission : 6-speed

Ontologies in Biology
The Protein Ontology (PO) provides a unified vocabulary for capturing declarative knowledge about protein domain and to classify that knowledge to allow reasoning. SBO is the Systems Biology Ontology project, another cornerstone of the BioModels.net effort. The goal of SBO is to develop Controlled vocabularies and ontologies tailored specifically for the kinds of problems being faced in Systems biology, especially in the context of computational modeling. The main objective of the Plant Ontology Consortium (POC) is to develop, curate and share controlled vocabularies (ontologies) that describe plant structures and growth and developmental stages, providing a semantic framework for meaningful cross-species queries across databases. The Gene Ontology project, or GO, provides a controlled vocabulary to describe gene and gene product attributes in any organism. It can be broadly split into two parts. The first is the ontology itself--actually three ontologies, each representing a key concept in Molecular Biology: the molecular function of gene products; their role in multi-step biological processes; and their localization to cellular components.

Gene Ontology- Introduction


The Gene Ontology was originally constructed in 1998 by a consortium of researchers studying the genome of three model organisms: Drosophila melanogaster (fruit fly), Mus musculus (mouse), and Saccharomyces cerevisiae (brewers' or bakers' yeast). Many other model organism databases have joined the Gene Ontology consortium, contributing both annotations for the genes of one or more organisms and also contributing to the development of the ontologies. As of January 2008, GO contains over 24,500 terms applicable to a wide variety of biological organisms. There is a significant body of literature on the development and use of GO, and it has become a standard tool in the bioinformatics arsenal.

GO example

Gene Ontology terms


Each GO term consists of a unique alphanumerical identifier, a common name, synonyms (if applicable), and a definition. When a term has multiple meanings depending on species, the GO uses a "sensu" tag to differentiate among them. New terms and annotations are suggested by members of the research and annotation communities. Once submitted, they are reviewed by members of the GO consortium to determine their applicability. If it is decided that a term in the ontology is not appropriate, it is deprecated, or marked as "obsolete". This can happen for a number of reasons, such as being outside the scope of the ontology or being misleadingly named or defined. The ontology file is freely available from the GO website; the terms can be searched and browsed online using the GO browser AmiGO. The Gene Ontology project also provides mappings of its terms to other classification systems covering the same areas of biology.

Aims of Gene Ontology


GO endeavors to develop cross-species biological vocabularies that are used by multiple databases to annotate genes and gene products in a consistent way. - GO vocabulary terms incorporate these annotations into their respective model organism databases. Building three extensive ontologies to describe molecular function, biological process, and cellular component -are common to all living forms and are basic to our annotation of information. One important feature is the development of the GO vocabularies is independent of the association of particular gene products with GO terms. Go contribute to the unification of biological information.

The Three Ontologies


1. The Ontology of Molecular Function Molecular function is defined as what a gene product does at the biochemical level. It describes only what is done without specifying where or when the event actually occurs or its broader context. Examples of broad functional terms are "enzyme," "transporter," or "ligand." 2. The Ontology of Biological Process Biological process refers to a biological objective to which the gene product contributes. A process is accomplished via one or more ordered assemblies of functions. It often involves transformation in the sense that something goes into a process and something different comes out of it. Examples of broad biological process terms are "cell growth and maintenance" or "signal transduction." Examples of more specific terms are "pyrimidine metabolism" or "cAMP biosynthesis." 3. The Ontology of Cellular Component Cellular component refers to the place in the cell where a gene product is found. These terms reflect our understanding of cell structure in a generic sense. Cellular component includes terms describing complexes where multiple gene products would be found, such as the "ribosome" or "proteasome." It also includes terms such as "nuclear membrane" or "Golgi apparatus." Thus, the term "cellular

Goals of the Gene Ontology


1. To compile a comprehensive structured vocabulary of terms describing different elements of molecular biology that are shared among life forms. - Terms are defined, may have synonyms and are organized into broader and narrower refinements. - Separate vocabularies are used to define separate dimensions of biology. 2. To describe biological objects (in the model organism database of each contributing member) using these terms. 3. To provide tools for querying and manipulating these vocabularies. - To add new vocabularies for additional aspects of biology. - To permit researchers to locate both terms and biological objects either via the Web or in more complex ways. - To allow others to set up satellite databases. 4. To provide tools enabling curators to assign GO terms to biological objects. - Sequence-based methods Editorial annotations

WHAT GO IS NOT
1. GO is not a way to unify biological databases. Sharing nomenclature is a step toward unification, but is not, in itself, sufficient. 2. GO is not a dictated standard, mandating nomenclature across databases. Groups participate because of self-interest and cooperate to arrive at a consensus. 3. GO does not define homologies between gene products from different organisms. The use of the GO results in shared annotations for gene products from different organisms, and this may reflect an evolutionary relationship, but the shared annotation is in itself not sufficient for such a determination. 4. GO does not allow us to describe genes in terms of which cells or tissues they're expressed in, which developmental stages they're expressed at, or their involvement in disease. It is not necessary for GO to do these things because other ontologies are being developed for these purposes.

Status of GO, end of, 2007


Biological process terms- 13 916 Molecular function terms- 7878 Cellular component terms- 2007 Sequence ontology terms- 1305 Annotation datasets- 35 Species with annotation- 137454 Annotated gene productsTotal 3 34 7495 Electronic 3 12 8309 Manual 2 19 186

Gene Ontology Project Online Resources Browsers


AmiGO http://amigo.geneontology.org miSO http://sequenceontology.org/miSO/index.html SourceForge trackers AmiGO https://sourceforge.net/tracker/?group_id=36855&atid=494390 GO https://sourceforge.net/tracker/?group_id=36855&atid=440764 SO http://sourceforge.net/tracker/?group_id=72703&atid=810408 Other Useful Web Pages GO http://www.geneontology.org/GO.downloads.shtml SO http://sequenceontology.org/ GO project http://www.geneontology.org/GO.contents.doc.shtml GO database http://www.geneontology.org/GO.database.shtml GO format http://www.geneontology.org/GO.format.shtml GO software http://www.geneontology.org/GO.tools.software-libraries Reference Genome annotations (graphical views) http://www.geneontology. org/images/RefGenomeGraphs/ GO public wiki http://wiki.geneontology.org/ Mailing lists

Open Biomedical Ontologies


Open Biomedical Ontologies (formerly Open Biological Ontologies) is an effort to create controlled vocabularies for shared use across different biological and medical domains. As of 2006, OBO forms part of the resources of the U.S. National Center for Biomedical Ontology, where it will form a central element of the NCBO's BioPortal. Contents [hide] OBO Foundry Related Projects OBO and Semantic Web External links [edit] OBO Foundry The OBO Ontology library forms the basis of the OBO Foundry, a collaborative experiment involving a group of ontology developers who have agreed in advance to the adoption of a growing set of principles specifying best practices in ontology development. These principles are designed to foster interoperability of ontologies within the broader OBO framework, and also to ensure a gradual improvement of quality and formal rigor in ontologies, in ways designed to meet the increasing needs of data and information integration in the biomedical domain. [edit] Related Projects Ontology Lookup Service The Ontology Lookup Service is a spin-off of the PRIDE project, which required a centralized query interface for ontology and controlled vocabulary lookup. While many of the ontologies queriable by the OLS are available online, each has its own query interface and output format. The OLS provides a web service interface to query multiple ontologies from a single location with a unified output format. Gene Ontology Consortium The goal of the Gene ontology (GO) consortium is to produce a controlled vocabulary that can be applied to all organisms even as knowledge of gene and protein roles in cells is accumulating and changing. GO provides three structured networks of defined terms to describe gene product attributes. Sequence Ontology The Sequence Ontology (SO) is a part of the Gene Ontology project and the aim is to develop an ontology suitable for describing biological sequences. It is a joint effort by genome annotation centres, including WormBase, the Berkeley Drosophila Genome Project, FlyBase, the Mouse Genome Informatics group, and the Sanger Institute. Generic Model Organism Databases The Generic Model Organism Project (GMOD) is a joint effort by the model organism system databases WormBase, FlyBase, MGI, SGD, Gramene, Rat Genome Database, EcoCyc, and TAIR to develop reusable components suitable for creating new community databases of biology. Standards and Ontologies for Functional Genomics SOFG is both a meeting and a website; it aims to bring together biologists, bioinformaticians, and computer scientists who are developing and using standards and ontologies with an emphasis on describing high-throughput functional genomics experiments. MGED The Microarray Gene Expression Data (MGED) Society is an international organisation of biologists, computer scientists, and data analysts that aims to facilitate the sharing of microarray data generated by functional genomics and proteomics experiments.

Ontologies in bioinformatics are intended to capture and formalize a domain of knowledge,

OBO is n umbrella organization for structured shared controlled vocabularies and ontologies for use within the genomics and proteomics domains. Of the criteria that ontologies must currently satisfy if they are to be included in the OBO library, the most important for our purposes are: first, inclusion of textual definitions or descriptions designed to ensure that the precise meanings of terms as used within particular ontologies will be clear to a human reader; second, employment of a standard syntax, such as the OWL or OBO flatfile syntax; third, orthogonality to the other ontologies already included in the library. These criteria are designed to support the integration of OBO ontologies, above all by ensuring the compatibility of ontologies pertaining to an identical subject matter. OBO has now added a fourth criterion to assist in achieving such compatibility, namely that the relations (edges) used to connect terms in OBO ontologies should be applied in ways consistent with their definitions as set forth in this paper. The Relation Ontology offered here is designed to put flesh on this criterion. How, exactly, should part_of or located_in be defined in order to ensure maximally reliable curation of each single ontology while at the same time guaranteeing maximal leverage in building a solid base for life-science knowledge integration in general? We describe a rigorous methodology for providing an answer to this question and illustrate its use in the construction of an easily extendible list of ten relations of a type familiar to those working in the bio-ontological field. This list forms the core of the new OBO Relation Ontology. What is distinctive about our methodology is that, while the relations are each provided with rigorous formal definitions, these definitions can at the same time be formulated in such a way that the underlying technical details remain invisible to ontology authors and curators.

TAMBIS Ontology
a conceptual representation of biological concepts and terminology, known as the TaMBIS Ontology (TaO) The aim of the TAMBIS Ontology (T.O.) is thus to capture biological and bioinformatics knowledge in a logical conceptual framework that is constrained in such a way that i) only biologically sensible concepts classify correctly, ii) it can encompass different user views, and iii) it makes biological concepts and their relationships computationally accessible TAMBIS (Transparent Access to Multiple Bioinformatics Information Sources) uses an ontology to enable biologists to ask questions over multiple external databases using a common query interface [1]. The TAMBIS ontology (TaO) [19] describes a wide range of bioinformatics tasks and resources, and has a central role within the TAMBIS system. An interesting difference between the TaO and some of the other ontologies reviewed here, is that the TaO does not contain any instances. The TaO only contains knowledge about bioinformatics and molecular biology concepts and their relationships - the instances they represent still reside in the external databases. As concepts represent instances, a concept can act as a question. The concept Receptor Protein represents the instances of proteins with a receptor function and gathering these instances is answering that question. The TaO is a dynamic ontology, in that it can grow without the need for either conceptualising or encoding new knowledge. In contrast, the other ontologies described here are static - developers must interveen and encode new conceptualisation to form new concepts. The TaO uses rules within the ontology to govern what concepts can be joined to another concept via relationships, to form new concepts. Thus the TaO places great emphasis on relations. A user can form a complex, multisource query, using relationships, in the following manner. Starting with the concept Protein, the TaO is consulted as to which relationships can be used to join Protein to other concepts. Amongst many, the following two are offered: is homologous to Protein and hasAccessionNumber AccessionNumber. Initially, the original Protein is extended to give a new concept Protein isHomologous to Protein (The concept Protein Protein homologue); then the second `protein' is extended with hasAccessionNumber AccessionNumber. The resulting concept (`Protein homologue of Protein with Accession Number') describes proteins which are homologous to protein with a particular accession number. This concept can be used as a source independent query containing no information on how to answer such a query. The rest of the TAMBIS system takes this conceptual query and processes it to an executable program against the external sources [20]. The TaO is available in two forms - a small model that concentrates on proteins and a larger scale model that includes nucleic acids. The small TaO, with 250 concepts and 60 relationships, describes Proteins and enzymes, as well as their motifs, secondary and tertiary structure, functions and processes. There is also supporting material on subcellular structure and chemicals, including cofactors. Motifs extend to detail such as the principal modification sites; function and process to broad classifications such as Hormone and Receptor, and Apoptosis and Lactation; structure extends to detail such as gross architecture - for example, SevenPropellor. Important relationships include is component of, has name, has function and is homologous to, as well as many more. The larger model, with 1500 concepts, broadens these areas to include concepts pertinent to nucleic acid, its children and genes.

TAMBIS aims to aid researchers in biological science by providing a single access point for biological information sources round the world. The access point will be a single interface (via the World Wide Web) which acts as a single information source. It will find appropriate sources of information for user queries and phrase the user questions for each source, returning the results in a consistent manner which will include details of the information source.

Ontologies provide a powerful mechanism for making conceptual information about biology computationally available. Ontologies therefore provide one mechanism by which conceptual information can be attached to the current flood of biological data and thereby help turn data into useful biological knowledge.

The ontology currently contains around 1800 asserted concepts. The concepts covered and the sources with which they are associated are shown below, along with examples of GRAIL constructs in which the concepts are used: Protein and protein sequence (from SWISS-PROT, [Bairoch et al. , 1996]), protein component motifs (from PROSITE, [Bairoch et al. 1997]), protein structure (as classified by CATH [Orengo et al., 1997]) and enzyme function (as defined in Prosite, and the Enzymes and Metabolic Pathways database - EMP, [Selkov et al., 1996]). We can therefore build concepts such as the tertiary structures of proteins which contain motifs that are involved in hydrolase activity: TertiaryStructure which isStructureOf (Protein which hasComponent (Motif which indicatesFunction Hydrolase)) Enzymes and metabolic pathways (as defined in the Enzyme database, [Bairoch, 1996]). This allows the construction of queries regarding enzymes and their reactions, for example enzymes which catalyse reactions which occur in the metabolism of thymine. Enzyme which catalyses (Reaction which occursIn (Metabolism which isMetabolismOf Thymine)). Expressed sequence tags (as defined by dbEST, [Boguski et al., 1993]). We can therefore create the concept of ESTs that code for proteins that contain glycosylation sites. EST which codesFor (Protein which hasComponent GlycosylationSite). Nucleic acids, their component motifs, gene function and expression [Stoesser et al. 1997, Stoesser et al. 1998]. The concept given below should be relatively self-explanatory. Gene which codesFor (Protein which hasFunction TransmembraneTransport). Sequence homology (BLAST, [Altschul et al., 1990]). Using ideas of homology we can create concepts linked to specific bioinformatics processes, for example the concept of the set of proteins homologous to a protein with a specific accession number. Protein which isHomologousTo (Protein which hasAccessionNumber P12345). Taxonomy (as defined at the NCBI web site [NCBI]). TaxonomicRank which < isRankOf PoeciliaReticulata isRankOf AmoebaProteus> i.e. the taxonomic rank common to both Poecilia reticulata and Amoeba proteus .

You might also like