There is a fair amount of confusion about who is who in the Semantic world, and what are the professions of Taxonomist, Data Librarian, Ontologist (no, not "oncologist" as LinkedIn would suggest) and a Semantic Architect.
I will lay out my view on this topic. The definitions as I am going to be using are not industry-standard, which means people might still use them interchangeably, or (semantics!) assign completely different meanings to these terms.
Let's have a look at the Semantic Heroes.
To better understand who they are, let's first do a quick recap on the topic they are curating.
The Semantic universe in its applied, non-academic definition, is the representation of internal and external concepts relevant to the business. Every concept is unambiguously defined. The concepts are linked to each other. They have some attributes, and some of them contain a finite list of individual instances described in a way that distinguishes them from other instances of the same concept.
Let's imagine we are working on a Circular Economy project. For the sake of simplicity (there will be a lot of simplification to this article), let's limit our scope to the circularity of the buildings.
Our universe is then focused on buildings and everything about and around them that can be relevant for materials recovery.
In our universe, we may have logical groups of concepts, or domains, like Building, Material, Usage, and Climate. We are going to keep it very basic and define how these concepts relate to each other.
Building contains parts that are built of Materials - or else, Materials are located in Buildings. Buildings are located in geographic locations for which Climate data is available. Since a Location is a concept in itself, we will add it to the list of domains in our universe. Buildings are subject to Usage.
There are millions of things that describe a Building, but let's say that for our CE purposes, we only care about the year of construction and all the relevant repair works that were conducted on it. Since the repair activities have their own attributes - such as the nature of repair and the date - they fully deserve to be expressed as an independent concept too.
Now we have created a very crude and partial ontology - or semantic representation of the business expressed in terms of classes (or things that exist) and predicates (attributes and links). For good or bad, in the last few minutes, we were busy doing Ontologist's job.
An Ontologist is a Semantic hero that creates and optimizes* semantic representations.
They are a Business Architect with a strong drive to arrange and organize concepts.
They are a might-have-been Enterprise Information Architect and a Strategist.
And they are familiar with RDF and Linked Data concepts and optimization techniques.
That said, demanding they hold an ICT degree and/or come writing code is unreasonable.
Let's move on and take a look at the domain of Materials. There are multiple groups of Materials, of which one, in particular, called Steel, is described by its carbon content, chromium content and density, as well as "composed_of" kind of link to other Materials. Let's leave it at that, even though in the real world, the attributes of various groups of Materials could make for a book of their own.
In our universe, only that many materials exist. Moreover, we don't want them to float in with the data as it enters our systems; we want to maintain total and centralized control over them. In a way, we can say that Materials are Reference Data - fairly static, centrally managed and represented by a finite list of individuals, called a taxonomy. We can list these individuals right within our universe.
For example, Cast Iron is a member of Steel group of Materials and it has carbon content > 2.1%.
Any Material that is an Iron and has carbon content > 2.1% is Cast Iron. This is the way we identify it when it does not come tagged as "Cast Iron" in our input data: here, again, we rely on semantic inference to do the magic for us.
What we've been busy with defining groups of materials and describing Cast Iron in terms that allow it to be inferred from the values of its attributes was the job of the Taxonomist.
Taxonomist is the Semantic hero concerned with inference rules and Reference Data management in our Semantic landscape.
They are the owl (not the bird, but the language Semantic Web uses to create statements that will be reasoned over) experts.
They are Business Analysts or Domain Experts to perfection.
They shall be technical enough to be able to test the taxonomies.
So far, we have been looking at the static picture of the universe in terms of things that are enclosed within. Now is the time for us demiurges to define some laws of physics by which the universe is going to function.
Where does the data come from? What kind of quality checks are necessary? How should the applications treat each kind of data?
For example, we can get the Climate attribute such as humidity from two sources. Some of them will provide the data under Creative Commons Zero (CC0) license, and some will prohibit the use of the data for commercial purposes. If our business provides Data Services whereas it caters the data to other companies, the later source's input cannot legally be included. Not only shall we link the attribute "humidity" to the two sources that we will also define as items in our universe; we shall also make sure that the second source has an annotation that the application doing data export for Data Services can read so anything coming from that source would be excluded.
The professional taking care of Data Lineage (or provenance), metadata (or annotations) and application instructions is the Data Librarian. Even though many people frown at the name - for "librarian" tends to associate with boring lists and archives - I find Data Librarianship to be one of the most exciting professions of today.
Bridging between Semantics, Application Architecture and Compliance, the Data Librarians help unlock the full power of Semantic Paradigm.
Data Librarians come in different flavors, and it is likely that for a full-on Semantic implementation you are going to need more than one:
an IT professional who understands application development and is able to do some coding and is familiar with Data Quality, Metadata Management, and Data Lineage concepts
a strong Business Analyst with an eye for compliance and experience with Data Sourcing.
Needless to say, both profiles need to embrace Linked Data concepts.
Now, our universe is almost ready to go. The only thing missing is plugging it into the rest of our business and IT landscape, and that is what a Semantic Architect does.
A Semantic Architect is an Enterprise Architect who delivers the integration of the Semantic universe into the landscape of the business and designs the necessary governance layer around it.
They are the person working with Application Architects to design data movements in and out the Semantic datastore or Semantic representation (if it remains virtual).
They are the Information Management expert that will make sure the governance rules, roles, and technologies are in place to ensure quality, availability, and accessibility of the Semantic data.
They are also able to operate on a high level, making sure the Semantic concepts are understood by the stakeholders.
Staffing the Semantic Initiative
The remaining question is: do I need all these four roles for my Semantic initiative?
The answer depends fully on the scope and complexity. It is possible to start with just one Semantic Architect and grow the team by adding more specialized roles as the scope grows and diversification becomes necessary.
Depending on the kind of the problem being solved, it is possible that you can make do without a specialized Taxonomist or Librarian, the scope of their domains picked up by a senior Ontologist or Semantic Architect. Or else, in a solution that is centered around tagging and category management, a Taxonomist might be able to provide all the necessary skills.
Moreover, you may discover that as your Semantic delivery goes from Proof of Concept to an integral part of the landscape, some people in your organization come out as Semantic Natives - people with a natural ability to embrace Semantic Paradigm. They can upskill themselves into one of the semantic roles.
As usual, feel free to ping me if you would like to learn more, or if you would like some support with the job description for a Semantic hero you are hiring.
That's owl for today!
*Note that optimization can be done for performance or flexibility or to accommodate unknown unknowns.