Ontologies in biology: design, applications and future challenges

Yüklə 144.43 Kb.
ölçüsü144.43 Kb.
1   2


Box 1 | Ontologies: rules and representation

Representing ontologies

Although ontologies might seem to be abstract entities, it is usually possible to illustrate them as graphs in which vertices (nodes, leaves) and edges (lines connecting the nodes) represent the terms and the rules of the ontology29. For bio-ontologies, this graph is usually no more than a hierarchy: this will be simple if each term has a single parent (such as in taxonomy; panel a) and more complicated if a term has two or more parents or relationships (panel b). An example of the latter would be the Gene Ontology (GO) (see Table 1).

The details of the graph also depend on whether the relationships are 'directed' or not. 'Directed' relationships (as shown in panels a and b) imply a parent–child linking between the concepts: if A is child of B, then we would typically expect that B is not a child of A. By contrast, 'undirected' rules carry no such implication: if A is next to B, then B is also next to A (panel c). If all the relationships in a valid ontology are directed, it is not possible to make closed loops, and the ontology can be represented by a directed acyclic graph (DAG; panel b).

The transitivity rule

One important aspect of the assertions and rules that together define the ontology is that they can be used to make logical inferences about the terms and their associated properties. An assertion that connects C to B together with one that connects B to A implies that the same relationship connects C to A; the logic of this inference process is defined by the 'transitivity' rule. To illustrate this with the anatomical example given in the text, the humerus is: part of the arm; has cell type osteoblast; has adhesion points for muscles; and is a bone. In this example, part of is transitive and the properties has cell type and has adhesion points can be inferred to hold for the whole, B, if they hold for the part, C. By transitivity, these properties will also hold for A if B is part of A; that is, the arm includes all the cell types and expressed genes for each of its constituent tissues. By contrast, descends from is not transitive and no deduction about the child can be made on the basis of the parent. (The reader should note that this analysis of the part-of relationship (or 'mereology') is highly simplified5.)

The is-a rule is also transitive but in the opposite direction: for example, individual bones have specific features that are not common to all bones (only the humerus has a radial groove). In terms of the previous example, if A, B and C are linked by is-a relationships, the appropriate properties of A can be associated with B and the properties of both B and A with C. Figure reproduced with permission from Ref. 29 © (2003) Wiley.


Discussion paper by Michael Ashburner on phenotype and trait ontology | Minutes from phenotype meetings | Database groups that participated in phenotype meetings: The Arabidopsis Information Resource | Berkeley Drosophila Genome Project | DictyBase | Flybase | Gramene | International Crop Information System | The Institute for Genome Resources — microbial systems | The London Dysmorphology Database | MaizeGDB | Mouse Anatomy | Mouse Genome Informatics | Mouse mutagenesis centres | Nugene | OMIM | Rat Genome Database | Saccharomyces Genome Database



D'Souza, D. The Virtue of Prosperity: Finding Values in an Age of Techno-Affluence (Simon and Schuster, Inc., New York, 2000).


Baxevanis, A. D. (ed.). Current Protocols in Bioinformatics (Wiley, New York, 2002).


van Heijst, G., Schreiber, A. & Wielinga, B. Using explicit ontologies in KBS development. Int. J. of Human-Computer Studies 46, 183–292 (1997). | Article |


Stein, L. D. Integrating biological databases. Nature Rev. Genet. 4, 337–345 (2003). | Article  | PubMed | ISI | ChemPort |


Simons, P. Parts: A Study in Ontology (Oxford Univ. Press, Oxford, UK, 1987).


Twigger, S. et al. Rat Genome Database (RGD): mapping disease onto the genome. Nucleic Acids Res. 30, 125–128 (2002). | Article | PubMed | ISI | ChemPort |


Garcia-Hernandez, M. et al. TAIR: a resource for integrated Arabidopsis data. Funct. Integr. Genomics 2, 239–253 (2002). | Article | PubMed | ChemPort |


Lawrence, C. J., Dong, Q., Polacco, M. L., Seigfried, T. E. & Brendel, V. MaizeGDB, the community database for maize genetics and genomics. Nucleic Acids Res. 32, D393–D397 (2004). | Article | ISI | ChemPort |


Drysdale, R. Phenotypic data in FlyBase. Brief Bioinform. 2, 68–80 (2001).
An early example of the use of multiple ontologies to describe phenotype. | PubMed | ChemPort |


Ware, D. H. et al. Gramene, a tool for grass genomics. Plant Physiol. 130, 1606–1613 (2002). | Article | PubMed | ISI | ChemPort |


Blake, J. A., Richardson, J. E., Bult, C. J., Kadin, J. A. & Eppig, J. T. MGD: the Mouse Genome Database. Nucleic Acids Res. 31, 193–195 (2003). | Article | PubMed | ISI | ChemPort |


Schofield, P. N. et al. Pathbase: a database of mutant mouse pathology. Nucleic Acids Res. 32, D512–D515 (2004). | Article | ISI | ChemPort |


Krieger, C. J. et al. MetaCyc: a multiorganism database of metabolic pathways and enzymes. Nucleic Acids Res. 32, D438–D442 (2004). | Article | ISI | ChemPort |


Hewett, M. et al. PharmGKB: the Pharmacogenetics Knowledge Base. Nucleic Acids Res. 30, 163–165 (2002). | Article | PubMed | ISI | ChemPort |


Hill, D. P., Blake, J. A., Richardson, J. E. & Ringwald, M. Extension and integration of the gene ontology (GO): combining GO vocabularies with external vocabularies. Genome Res. 12, 1982–1991 (2002).
Proposes a way to generate more specific ontologies by combining concepts from two orthogonal ontologies. | Article | PubMed | ISI | ChemPort |


Harhay, G. P. & Keele, J. W. Positional candidate gene selection from livestock EST databases using Gene Ontology. Bioinformatics 19, 249–255 (2003). | Article | PubMed | ISI | ChemPort |


Lin, J. et al. GeneCensus: genome comparisons in terms of metabolic pathway activity and protein family sharing. Nucleic Acids Res. 30, 4574–4582 (2002). | Article | PubMed | ISI | ChemPort |


Draghici, S. et al. Onto-Tools, the toolkit of the modern biologist: Onto-Express, Onto-Compare, Onto-Design and Onto-Translate. Nucleic Acids Res. 31, 3775–3781 (2003). | Article | PubMed | ISI | ChemPort |


Christie, K. R. et al. Saccharomyces Genome Database (SGD) provides tools to identify and analyze sequences from Saccharomyces cerevisiae and related sequences from other organisms. Nucleic Acids Res. 32, D311–D314 (2004). | Article | ISI | ChemPort |


King, O. D. et al. Predicting phenotype from patterns of annotation. Bioinformatics 19 (Suppl. 1), I183–I189 (2003).
Uses decision trees to predict phenotypes of yeast mutants on the basis of genes' annotations to GO and other phenotypic descriptions. | Article | PubMed |


Tulipano, P. K., Millar, W. S. & Cimino, J. J. Linking molecular imaging terminology to the gene ontology (GO). Pac. Symp. Biocomput. 613–623 (2003). | PubMed | ChemPort |


Bodenreider, O., Mitchell, J. A. & McCray, A. T. Evaluation of the UMLS as a terminology and knowledge resource for biomedical informatics. Proc. AMIA Symp. 61–65 (2002). | PubMed |


Leroy, G. & Chen, H. Meeting medical terminology needs — the Ontology-Enhanced Medical Concept Mapper. IEEE Trans. Inf. Technol. Biomed. 5, 261–270 (2001).
Describes a query tool that involves the mapping of different concepts using human-created ontologies and natural language processing. | Article | PubMed | ISI | ChemPort |


Bodenreider, O., Burgun, A. & Mitchell, J. A. Evaluation of WordNet as a source of lay knowledge for molecular biology and genetic diseases: a feasibility study. Stud. Health Technol. Inform. 95, 379–384 (2003).
Maps GO terms and NCBI's LocusLink terms to WordNet to determine the overlap between molecular biological and lay knowledge. | PubMed |


Judd, W. S., Campbell, C. S., Kellogg, E. A., Stevens, P. F. & Donoghue, M. J. Plant Systematics: A Phylogenetic Approach (Sinauer Associates, Inc., Sunderland, Massachusetts, 2002).


Cook, D. L., Farley, J. F. & Tapscott, S. J. A basis for a visual language for describing, archiving and analyzing functional models of complex biological systems. Genome Biol. 2, RESEARCH0012 (2001).
Provides a lexicon of icons to graphically represent molecular biology information. | Article |


Sigman, M. & Cecchi, G. A. Global organization of the WordNet lexicon. Proc. Natl Acad. Sci. USA 99, 1742–1747 (2002).
Applies graph theoretical calculations to analyse the organization of WordNet. | Article | PubMed | ChemPort |


Ogata, H., Fujibuchi, W., Goto, S. & Kanehisa, M. A heuristic graph comparison algorithm and its application to detect functionally related enzyme clusters. Nucleic Acids Res. 28, 4021–4028 (2000).
Uses graph comparison methods to correlate the genome locations of microbial genes and these organisms' metabolic pathways. | Article | PubMed | ISI | ChemPort |


Bard, J. Ontologies: formalising biological knowledge for bioinformatics. Bioessays 25, 501–506 (2003). | Article | PubMed | ISI | ChemPort |


Rosse, C. et al. Motivation and organizational principles for anatomical knowledge representation: the digital anatomist symbolic knowledge base. J. Am. Med. Inform. Assoc. 5, 17–40 (1998).
Proposes a human anatomy ontology that accommodates both the systemic and regional (topographical) views of anatomy. | PubMed | ISI | ChemPort |


Trombert-Paviot, B. et al. GALEN: a third generation terminology tool to support a multipurpose national coding system for surgical procedures. Int. J. Med. Inf. 58–59, 71–85 (2000).
Provides an information-management architecture for handling all types of clinical data in language-independent ways. | Article | ISI |


Harris, M. A. et al. The Gene Ontology (GO) database and informatics resource. Nucleic Acids Res. 32, D258–D261 (2004). | Article | ISI | ChemPort |


Hill, D. P. et al. The mouse Gene Expression Database (GXD): updates and enhancements. Nucleic Acids Res. 32, D568–D571 (2004). | Article | ISI | ChemPort |


Noy, N. F. et al. Protege-2000: an open-source ontology — development and knowledge-acquisition environment. Proc. AMIA Symp. 953 (2003). | PubMed | ChemPort |


We thank the curators of the various animal, plant and prokaryote databases who participated in the mutant phenotype ontology meetings (see list of URLs in online links box for groups that participated). We are grateful to S. Aitkin for commenting on the material in box 1 and to M. Buzgo for providing the photographs in figure 4 and for helpful comments on the manuscript. S.Y.R. is supported in part by the National Science Foundation (NSF), and J.B.L.B. thanks the Biotechology and Biological Sciences Research Council (BBSRC) for funding. This is Carnegie publication 1680.

We dedicate this paper to the late Robin Winter who articulated much of our knowledge about human congenital dysmorphologies and who is sorely missed.

Competing interests statement. The authors declare that they have no competing financial interests.
1   2

Verilənlər bazası müəlliflik hüququ ilə müdafiə olunur ©azrefs.org 2016
rəhbərliyinə müraciət

    Ana səhifə