From folksonomies to ontologies

Photo: Uche OgbujiUche Ogbuji of Zepheira discusses how early adopters are introducing Semantic Web to the enterprise.

Interview conducted by Alan Morrison, Bo Parker, Bud Mathaisel, and Joe Mullich

Uche Ogbuji is a partner at Zepheira, LLC, a consultancy specializing in next-generation Web technologies. Ogbuji’s experience with enterprise Web data technologies reaches back to the inception of Extensible Markup Language (XML). In this interview, Ogbuji discusses how Zepheira helps companies with semantic interoperability issues, and he provides insight into the data silo problems organizations face.


PwC: What kinds of issues do Zepheira clients ask about?

UO: We’re a group of 10 folks who speak a lot at conferences, and I write a lot. So we very often get inquiries from people who say, “Something that you said or a case study that you presented caused a light bulb to go on in my head, in terms of how this can help my company.”

These are typically folks at a department level. They’re ambitious; they are looking to take their entire company to the next level by proving something in their department, and they recognize that the beauty of something like semantic technology is that it can build from a small kernel and accrete outwards in terms of value.

These department-level people have a practical problem that often has to do with having data in a lot of silos that, if they could integrate better, they could get more value out of. Obviously that’s an age-old problem, one that goes back to the primordial days of computing. But I think they see that this is an opportunity to use a very interesting technology that has some element of being proven on the Web and that allows things to be done on a small scale.

PwC: So sometimes they’re trying to organize structured data that is in silos as well as unstructured data?

UO: Right.

PwC: Speaking of structured and unstructured data, could you give us an example of how a department head might find the Semantic Web useful in that context?

UO: If you have a bunch of data in different files, some of it structured and some of it unstructured, you often have different systems developed in different areas at different times. The business rules included in them are different. They don’t quite match up. You have a lot of inefficiency, whether from the complexity of the integration process and code or the complexity of the day-to-day activities of a line-of-business person.

What we typically do on an engagement is try to capture what I would call schematic information, which is information about what relates to what. They’re deceptively simple links between entities in one silo and entities in another silo, so we’re not talking about a huge, formal, scientific, top-down modeling exercise. We’re talking about links that are almost at the level of social tagging, almost at the folksonomy level.

We’ve found that when you provide a basis for people to say that this entity, this sort of information in this silo relates to this other sort of information in this silo, then the people who are involved fill in the nooks and crannies. You don’t have to have this huge engineering effort to try to force a shared model between them.

So I think the benefit that department heads get from something like Semantic Web technology is that it’s designed to go from very slim threads and very slim connections, and then have those strengthened over time through human intervention.


“There’s a large social element in building shared models, and once you have built those shared models, you have the social benefit of having people enfranchised in it.”

There’s a large social element in building shared models, and once you have built those shared models, you have the social benefit of having people enfranchised in it. Some organizations had a situation where trying to do data governance was warfare, because of the competing initiatives. Now you have given people the capability to do it piecemeal collaboratively, and you have less of the warfare and more of the cooperation aspect, which improves the system that they’re developing.

PwC: Can you give us an example of a company that’s done this sort of collaboration?

UO: One concrete example is the work we did with the global director for content management at Sun Microsystems. Her office is in charge of all the main sun.com Web sites, including www.sun.com, the product sites, solutions, global versions of the sites, and the company’s business-to-business [B2B] catalogs. Her department had data, some of which is Oracle database content—warehouse-type data, a lot of which is XML [Extensible Markup Language]—and some of which is spool files.

Governance was not in place to automate the pipelines between all that mess of silos. And getting out a coherent Web site was a pain. They had some real problems with price policy and traceability for, say, prices that appear on the catalog Web site. It used to be a very manual, intensive process of checking through everything. We worked with this department to put together a platform to create lightweight data models for the different aspects of product information that appeared on these Web sites, as well as to make those models visible to everyone.

Everyone could see the components of a lightweight data model and the business rules in a way that’s as close as possible to stuff that a line-of-business person could understand. That helped them head off major disagreements by dealing with all inconsistencies piecemeal. It’s not perfect, but now they have a quicker time to market for reliable product announcements and reliable information updates, and that was really valuable. And on the personal and social side of things, I’ve personally been very satisfied to watch that the lady who brought us in has been promoted quite a few times since we’ve been working with her. Very often that’s the motivation of these people. They know it can be valuable, and they’re looking to do something special for their company.

PwC: What was the breakthrough that you alluded to earlier when you talked about the new ability to collaborate? While there used to be a data governance war and everybody had their own approach to the problem, what caused this ability to collaborate all of a sudden?

UO: It’s slightly different in each organization, but I think the general message is that it’s not a matter of top down. It’s modeling from the bottom up. The method is that you want to record as much agreement as you can. You also record the disagreements, but you let them go as long as they’re recorded. You don’t try to hammer them down. In traditional modeling, global consistency of the model is paramount. The semantic technology idea turns that completely on its head, and basically the idea is that global consistency would be great. Everyone would love that, but the reality is that there’s not even global consistency in what people are carrying around in their brains, so there’s no way that that’s going to reflect into the computer.


“You’re always going to have difficulties and mismatches, and it will turn into a war, because people will realize the political weight of the decisions that are being made. There’s no scope for disagreement in the traditional top-down model. With the bottom-up modeling approach you still have the disagreements, but what you do is you record them.”

You’re always going to have difficulties and mismatches, and, again, it will turn into a war, because people will realize the political weight of the decisions that are being made. There’s no scope for disagreement in the traditional top-down model. With the bottom-up modeling approach you still have the disagreements, but what you do is you record them.

PwC: Have you begun to understand the opportunity here for a class of business problems that have been heretofore either not solvable or too expensive to solve with traditional approaches and that define a continuum from purely Semantic Web value possibilities to purely highly structured and controlled vocabularies?

UO: You would not want a semantic technology-driven system whose end point is the XBRL [Extensible Business Reporting Language] filing to the SEC [Securities and Exchange Commission]. That would be an absolute disaster. So there is absolutely a continuum—from departments and use cases where this is appropriate, cases where it’s a hybrid, and cases where you need very, very structured, centralized control. The XBRL example is a great one. XBRL is semantic technology in itself because of the way its taxonomies use links. It doesn’t use RDF [Resource Description Framework], but it does use taxonomic links that are basically the same as RDF except for the actual tag format.

The largest companies have to file in XBRL. To meet those XBRL filing mandates, a lot of companies have centralized departments—sometimes within IT or within accounting’s own shadow IT—pull all the reports. Even ERP [enterprise resource planning] and things like it do not feed straight into the XBRL system. They have a firewall, very often, and I’m not an expert at XBRL implementations, but I’m very familiar with the space, and this is what I’ve understood. They have a firewall even between the centralized, highly controlled ERP of the enterprise and what goes into that XBRL filing, because even when you have something as highly controlled as, say, an enterprisewide ERP, it is not necessarily considered safe enough from the point of view of tight control by the party responsible for reporting.

It’s not a problem unique to semantic technology. Let’s say you had a situation where you had semantic technology on one end and you wanted information from that to go into a filing. You would still want the same sort of firewall where the auditors and the other experts could look at the semantic technology’s surface version of the truth as an input, but they would still decide what goes into the actual numbers for the filing.


“You would not want a semantic technology-driven system whose end point is the XBRL [Extensible Business Reporting Language] filing to the SEC [Securities and Exchange Commission]. That would be an absolute disaster. So there is absolutely a continuum—from departments and use cases where this is appropriate, cases where it’s a hybrid, and cases where you need very, very structured, centralized control.”