"Semantic approach allows the systems to acquire an almost human cognition". No, not a quote from some manual - these are mine own words. That's how I get the people captivated by (strong opinion strongly held) the most beautiful paradigm of Information Technology.
But let's take a look at what's behind these words, and whether the almost human cognitive ability of semantic inference is a blessing, a curse in disguise - or both.
Before we start, let's have a quick look at what Semantic Reasoning is and how to slice it.
To do that, a quick recap on Semantics is due.
If data is the prime matter of our business, the elements that express and materialize it, Semantic Architecture is an act of arranging that prime matter into a consistent universe, composed from the things that exist (classes), the attributes that describe them (datatype properties), and the links between them (object properties). Once the static picture of the business universe is ready, Semantic Architecture defines the laws of physics governing that universe: the meta-metadata, or annotations.
Here you can read more about the various demiurges that create the Semantic Universe.
A proper exercise in Semantics will deliver every class and every property with a comprehensible comment and clear label.
Interacting with an Alien
One day, an alien from a far away galaxy comes to visit our universe. Their galaxy is so different from ours, that although they have the ability to understand the language, they totally lack the vocabulary, save for a minimal set of nouns and verbs.
If we want them to stay safe and healthy during their visit, we must explain them every concept in our world using the limited vocabulary they come equipped with.
We need to explain them the meaning of "chocolate". Not every chocolate bar they come across will have nice shiny "CHOCOLATE" written on it, but all of them will have the list of ingredients - so we make sure the alien knows that "a food item containing milk, sugar and cocoa butter is "chocolate"".
We need to make sure they understand the meaning of "mother": "mother is a female that has at least 1 child".
And, for good or bad, we've just enabled our alien to perform semantic reasoning: to infer the class label of the food item, or to reason about the family trees.
This futuristic scenario is not a simplification at all: our Semantic Platform can be that alien, and our axioms - or sets of statements defining a resource in a machine-readable way - enable it to perform the reasoning and inference.
When a data source for food items comes with an unlabeled food item containing milk, sugar and cocoa butter, the system will be able to automatically label it as "chocolate" - and return it with any query that is looking for anything related to chocolate.
And when a data source labels an individual as having a "mother" relationship to another individual, the system will be able to automatically deduce that the former individual is a female, and has a child, and that the latter individual can be labeled and returned as "child" whenever the system is inquired.
People with Tails
So far, it seems like we just created an extra brain able to save us a lot of application logic. But alas, it's not so simple: semantic reasoning can - and will, if applied thoughtlessly - result in slow, cumbersome systems returning results that will feel random at best.
Let's come back to our definition of "mother".
"Mother is a female that has at least 1 child". An immediate question a thoughtful alien could post is: if Alice is the mother of Betty, is Betty also the mother of Alice? Easy to overlook, but disastrous to experience, the symmetry of the relationships must be specified unless we end up with the most unpredictable outcome of the reasoning exercise.
"Mother" is a relationship that is asymmetric - it is only valid on one direction.
Moreover, we can safely specify that every child must have a mother - if there is no data, it only means there is no knowledge of the mother, not that there is no mother as such. Yet, a female must not have a child - these without any data on children could be mothers, but could also have no children and hence not qualify as mothers.
In addition, can a transgender man be a mother? In this context, what exactly is the meaning of "female"? What is the meaning of "assigned sex", and what are the possible values? Which of these values should our system treat as "female"? Which of the attributes should the system interpret as "assigned sex", seen the interchangeable use of "sex" and "gender" words?
And now, with just one single predicate, we have cursed ourselves into navigating the ever-expanding philosophic domains around our resource. It can - and will - take time to swim out of it. And as we do, our ship is bound to crash on one edge case or the other - in fact, semantic reasoning does not, as such, "disregard" the data: it will, by design, not hesitate to use every piece of it.
Reasoning takes extensive testing, and forgetting, omitting or misplacing a single axiom somewhere in a remote corner of our semantic universe may result in the raise of people with tails: literal human beings the system is convinced should come equipped with an extra appendage because someone, somewhere forgot to specify a disjoint.
To add insult to injury, semantic reasoning is no magic - it takes computational power, slowing down the system performance.
My Universe, My Rules
So, does the Semantic Paradigm provide the solution to the reasoning problem? And if not, what is the use of Semantic Paradigm?
Fortunately, besides reasoning, the Semantic Paradigm supports another glamorous concept: self-building application landscapes. Remember, RDF was created as a system of metadata, and there lies its greatest power. The metadata that resides right with data, and IS also data in its own right - and hence can be queried as such - allows for creation of self-building, intelligent applications. The logic of these applications is not embedded in the code or residing in some remote library: it is delivered by the data itself.
Good news is, for a business purpose we are creating a system that is specific to the business.
We know the use cases.
We know our technology.
We know how and when each piece of data (and metadata) will be used.
Hence, we can use our metadata to instruct the applications to reason - in a concrete, case-specific manner. We don't have an alien in our own business landscape - we have a smart and efficient robot!
We can use application support annotation properties to make sure any female with children is counted as "mother", but if there is any difference between sex and gender of the individual, the label "mother" is not displayed to the end users. We can discard the data coming with a sex or gender outside of our restricted value list. We can define the threshold for inclusion/exclusion of such individuals from the "count of mothers", and we can, respecting the privacy, be very unambiguous and avoid any risk of exposure to the end user, would any single criterion be not met, or any single data point be absent.
Moreover, embedding these rules as custom annotation properties, we can improve performance delegating the execution of this logic to a service - not the SPARQL engine.
So, Why Reason?
There is still a use case for semantic reasoning. Remember how we stated that when the use cases are known and technology is controlled, we can delegate the reasoning to smart annotations and the application layer?
Academia is one space that does not serve a particular set of business use cases: it caters for a common, generic repository of knowledge.
In addition, the specific uses of that repository for business needs are not known to the Data Demiurges in academic world, and are, all weighted in, unpredictable.
For this case, the semantic reasoning framework provides a standardized way of documenting the rules and definitions in an unambiguous and testable manner.
Semantic reasoning is a powerful way to not rely on human interpretation when documenting the rules and definitions in the ontologies.
However, if ever you find yourself relying on an external ontology for a specific business case, I would advice to consider replacing the many owl axioms by annotations specifically designed to support your business, and creating services in the application layer that will take care of metadata-driven execution of the logic.
That's owl for today!
by Andreas Ingvar van der Hoeven