Notes: Ontology creation for cognitive computing

The creation of ontologies continues to slow down many cognitive computing projects. Here are some notes from a quick exploration.

From Wikipedia:

“Ontology is the philosophical study of the nature of being, becoming, existence, or reality, as well as the basic categories of being and their relations. Traditionally listed as a part of the major branch of philosophy known as metaphysics, ontology often deals with questions concerning what entitiesexist or may be said to exist, and how such entities may be grouped, related within a hierarchy, and subdivided according to similarities and differences. Although ontology as a philosophical enterprise is highly theoretical, it also has practical application in information science andtechnology, such as ontology engineering.”

More interesting for our purposes from the arena of cognitive computing is the area of “ontology engineering”. Here’s the definition from Wikipedia:

Ontology engineering in computer science and information science is a field which studies the methods and methodologies for building ontologies: formal representations of a set of concepts within a domain and the relationships between those concepts. A large-scale representation of abstract concepts such as actions, time, physical objects and beliefs would be an example of ontological engineering.[2]Ontology engineering is one of the areas of applied ontology, and can be seen as an application of philosophical ontology. Core ideas and objectives of ontology engineering are also central in conceptual modeling.

This looks interesting for getting to the next level of detail (if somewhat dated (2004)):

Ontological Engineering: With examples from the areas of Knowledge Management, e-Commerce and the Semantic Web

Fragments of reviews for the book:

  • “Also discussed in the book, and of enormous practical interest, is the automation of the ontology building process. Called `ontology learning’ by the authors, they discuss a few of the ways in which this could take place. One of these methods concerns ontology learning using a `corpus of texts’, and involves being able to distinguish between the `linguistic’ and `conceptual’ levels. Knowledge at the linguistic level is described in linguistic terms, while at the conceptual level in terms of concepts and the relations between them. Ontology learning is thus dependent on how the linguistic structures are exemplified in the conceptual level. Relations at the conceptual level for example could be extracted from sequences of words in the text that conform to a certain pattern. Another method comes from data mining and involves the use of association rules to find relations between concepts. The authors discuss two well-known methods for ontology learning from texts. Both of these methods are interesting in that they can apparently learn in contexts or environments that are not domain-specific. Being able to learn over different domains is very important from the standpoint of the artificial intelligence community and these methods are a step in that direction. The processes of `alignment’, `merging’, and `cooperative construction’ of ontologies that are discussed in the book are also of great interest in artificial intelligence, since they too will be of assistance in the attempt to design a machine that can reason over multiple domains.”
  • “The automation of ontology building would of course be a major advance. To accomplish this however would require that the machine be able to simultaneously and recursively construct the knowledge base and reason over it effectively. This is a formidable challenge indeed.”
  • “A large portion of the book describes the acute problem of somehow extracting meaning in a programmatic manner from data. Because the manual making of an ontology simply does not seem to scale, given the realities of gigabyte databases. We see that there is a natural decomposition of the problem into a linguistic step and a conceptual step. The former is tied to a particular human language. The latter is the nut of the problem. Current methods look promising, but are certainly not the last word.”


Wikipedia talks about Ontology Learning:

Ontology learning (ontology extraction, ontology generation, or ontology acquisition) is the automatic or semi-automatic creation of ontologies, including extracting the corresponding domain’s terms and the relationships between those concepts from a corpus of natural language text, and encoding them with an ontology language for easy retrieval. As building ontologies manually is extremely labor-intensive and time consuming, there is great motivation to automate the process.

Typically, the process starts by extracting terms and concepts or noun phrases from plain text using linguistic processors such as part-of-speech tagging and phrase chunking. Then statistical[1] or symbolic [2][3] techniques are used to extract relation signatures, often based on pattern-based[4] or definition-based[5] hypernym extraction techniques.”

A few extracts from:

Wong, W., Liu, W. & Bennamoun, M. (2012), “Ontology Learning from Text: A Look back and into the Future”. ACM Computing Surveys, Volume 44, Issue 4, Pages 20:1-20:36.

This article is somewhat focused on building ontologies for the semantic web, but has interesting observations on the state of the art for automated ontology creation.

  • “Ontologies can be thought of as directed graphs consisting of concepts as nodes and relations as the edges between the nodes. A concept is essentially a mental symbol often realized by a corresponding lexical representation (i.e., natural language name). For instance, the concept “food” denotes the set of all substances that can be consumed for nutrition or pleasure. In Information Science, an ontology is a “formal, explicit specification of a shared conceptualisation” [Gruber 1993].”
  • Screen Shot 2016-02-26 at 12.37.44 PM
  • “There are five types of output in ontology learning, namely, terms, concepts, taxo- nomic relations, non-taxonomic relations, and axioms. Some researchers [Buitelaar et al. 2005] refer to this as the ontology learning layer cake.”
  • Screen Shot 2016-02-26 at 12.40.07 PM
  • “In document retrieval, the object of evaluation is documents and how well systems provide documents that satisfy user queries, either qualitatively or quantitatively. However, in ontology learning, we cannot simply measure how well a system constructs an ontology without raising more questions. For instance, is the ontology good enough? If so, with respect to what application?”
  • “Since the publication of the five survey papers [Ding and Foo 2002; Gomez-Perez and Manzano-Macho 2003; Shamsfard and Barforoush 2003; Buitelaar et al. 2005; Zhou 2007], research activities within the ontology learning community have largely been focused on improving (1) term extraction and concept formation and (2) relation discovery techniques. The learning of ontologies (3) from social data and (4) across different languages has also been a topic of great research interest in the later part of the past decade.”
  • “Besides the social dimension of ontology creation, ontology learning from multilin- gual text is also gaining popularity. Hjelm and Volk [Hjelm and Volk 2011; Hjelm 2009] discussed ways to automatically construct ontologies by exploiting cross-language in- formation from parallel corpora.”
  • On Scoring and Extracting Terms: “The current state of the art is based mainly on statistical semantics and paradigmatic and syntagmatic relations, that is to say, we determine the relevance of terms through observations in very large samples and through the way the constituents of a term are put together.”
  • “… for taxonomic and non-taxonomic relation discovery, we are witnessing the increasing application of lexico-syntactic patterns, association rule mining, and rules based on syntactic dependencies on very large datasets from the Web.”

Fragments from: (book introduction, 2005)

  • Ontology learning has become a major area of research within the wider area of artificial intelligence and natural language processing. This is largely due to the adoption of ontologies (especially formal ontology expressed in OWL) as the standard form of knowledge representation in the Semantic Web.
  • By a judicious selection of techniques ranging from part-of-speech tagging, chunking, and parsing to clustering and IR methodologies, they attempt to deal with the three fundamental issues involved in constructing ontologies: associating terms, building hierarchies of terms and concepts, and identifying and labeling ontological relations.
  • “ontology-learning layer cake,”

Fragments from chapter one of “An introduction to ontology learning”, Lehmann and Volker, 2014

  • Ontology learning approaches are as heterogeneous as the sources of data on the web, and as different from one another as the types of knowledge representations called “ontologies”
  • no general agreement on which requirements the formal representation needs to satisfy in order to be appropriately be called an ontology. Depending on the particular point of view, ontologies can be simple dictionaries, taxonomies, thesauri, or richly axiomatized top-level formalisations
  • Ontologies play a central role in data and knowledge integration. By providing a shared schema, they facilitate query answering and reasoning over disparate data sources
  • However, the construction of ontologies is a highly expensive task which crucially hinges on the availability of scarce expert resources [39]. In order to build a formal ontology for a particular domain of interest, for instance, specialized domain knowledge needs to be acquired and formalized in a way that automated inference will yield the expected results. This goal can only be achieved if domain experts collaborate with skilled ontology engineers familiar with the theory and practice of knowledge representation – and once the ontology has been constructed, evolving knowledge and application requirements will demand for continuous maintenance efforts [[reference 39 is: Elena Simperl, Tobias Buerger, Simon Hangl, Stephan Woelger, and Igor Popov. Ontocom: A reliable cost estimation method for ontology development projects. Web Semantics: Science, Services and Agents on the World Wide Web, 16(0):1 – 16, 2012]]
  • One grouping: Ontology Learning from Text mostly focuses on the automatic or semi-automatic generation of lightweight taxonomies by means of text mining and information extraction. Many of the methods used in ontology learning from text (e.g. lexicosyntactic patterns for hyponymy detection or named-entity classification) are inspired by previous work in the field of computational linguistics, essentially designed in order to facilitate the acquisition of lexical information from corpora. Some ontology learning approaches do not derive schematic structures, but focus on the data level. Such ontology population methods derive facts from text. A popular example is the Never-Ending Language Learning (NELL) project [10], which reads the web to add statements to its knowledge base and improves its performance over time, e.g. via user feedback.

My reading, overall, is that the creation of ontologies remains a time consuming exercise for experts and a relatively unsolved problem for automated systems.


Other references:

2008, Ontology Engineering – The DOGMA Approach, Jarrar and Meersman,

Tutorial on Ontological Engineering: “Part 3: Advanced course of ontological engineering”, Riichiro Mizoguchi,



Fragment: Symbolic vs Sub-symbolic AI

The symbolic-AI camp models knowledge as specific, explicitly-represented objective facts that get manipulated by formal, repeatable rules, and the sub-symbolic or connectionist camp is all about building systems that adapt, in hard-to-analyze ways, to perform actions and anticipate things in a way that seems to demonstrate knowledge but where the knowledge itself can’t easily be understood or extracted as a list of explicit facts or rules.