Semantic technologies at the ecosystem level

Photo: Frank Chum Frank Chum of Chevron talks about the need for shared ontologies in the oil and gas industry.

Interview conducted by Alan Morrison and Bo Parker

Frank Chum is an enterprise architect at Chevron whose career as a computer scientist spans several decades. During the 1980s, he worked at Coopers & Lybrand and Texaco, where he focused on artificial intelligence. In December 2008, he co-chaired a World Wide Web Consortium (W3C) workshop that Chevron hosted about the Semantic Web in the oil and gas industry.

In this interview, Chum discusses the role of semantics in knowledge-intensive industries, Chevron’s major steps to take advantage of Semantic Web techniques, and how the oil and gas industry hopes to emulate the healthcare industry’s efforts in ontology development.

PwC: How will the Semantic Web address some of the business issues confronting Chevron?

FC: We spend a lot of time trying to find information. The area we call information management deals with unstructured as well as structured information. To help our geoscientists and engineers find the information they need to do their jobs, we need to add more accuracy, more meaning into their searches. The goal is to let them find not just what they’re looking for, but also find things that they might not know existed.

PwC: At the end of the day, is this fundamentally about the business of looking for and extracting oil?

FC: Correct.

PwC: So there is a business decision at some point about what to do in a particular situation, given the information presented—whether to drill, whether to buy or lease or go into a joint venture with somebody else. Is that ultimately the funnel that this all points to?

FC: Yes. Actually, years ago as an artificial intelligence [AI] specialist with Texaco, I built a system for analogical reasoning. We modeled the important basins of the world that share certain characteristics. With that information, we were able to compare fields or sites that have similar characteristics and that probably would have the same type of oil production performance. That comparison involved the notion of inferencing—doing case-based reasoning, analogical reasoning.

Right now, analogical reasoning is very doable, and the benefits for the oil and gas industry compare with those for the healthcare and life sciences industries. They have similar issues from the vantage point of drug discovery, protein mapping, and the like, so we’re
looking at those industries to try to model ourselves after them.

PwC: The W3C [World Wide Web Consortium] has a working group in that area. Do you hope to collaborate with some other folks in that group on a shared problem?

FC: Yes. When we joined the W3C and were working with them in Semantic Web areas, Roger Cutler, who’s here at Chevron, joined the Semantic Web Health Care and Life Sciences Interest Group. He didn’t have any connection to the industry-specific subjects they were talking about—he joined to learn from them how they were using the Semantic Web in a practical, industry setting. And so that’s part of the reason why we had the oil and gas workshop—because we think we need a critical mass to do in the oil and gas industry what healthcare has done and to advance through that kind of industry collaboration.

PwC: What are you seeing in the Semantic Web Health Care and Life Sciences Interest Group that you specifically want to emulate?

FC: You wouldn’t think that pharmaceutical companies would share a lot of information, but in fact the opposite is true, because the sheer amount of investment needed to develop a drug is in the billions of dollars in research. It’s the same thing with oil companies; the amount of money invested is in the billions. Building an offshore platform can be a multibillion-dollar venture if it is in a hostile environment like deep water or the arctic, and these extremely expensive undertakings are often joint ventures. So we need to be able to share lots of information not only with our joint venture partners, but also with the design and engineering companies that designed platforms as well as with the people who are going to manufacture the platforms, fabricate them, and put them in place. That’s a lot of information, and without standardization, it would be difficult to share.

PwC: Does the standardization effort start with nomenclature? It seems like that would be really important for any set of business ecosystem partners.

FC: Yes. ISO 15926 is one initiative. There are many potentially confusing terms in drilling and production that benefit from having a common nomenclature. We also have standards such as PRODML—production markup language—and many other standards associated with exchanging data.

PwC: You’ve been involved in the oil and gas industry for quite a while. If you think about the history of that industry, how does the Semantic Web represent building on top of what’s come before? Or is it throwing a lot of stuff away and starting over?

FC: I think it’s a different approach. I think the Semantic Web actually provides maturity to AI, in a sense. To quote Patrick Winston of MIT [Massachusetts Institute of Technology] on AI, he said, “You need to consider AI not as a whole system, but actually as a little piece of a system that would make the system work more or better.” Consider AI as raisins in a loaf of bread to make the loaf of bread more flavorful. People thought of AI as an AI system, entirely a big thing, rather than that nugget that helps you to enhance the performance. I think the Semantic Web is the same thing, because you’re looking at the Web as a platform, right, and data semantics are that nugget that make the Web more meaningful because a machine can understand information and process it without human intervention, and, more importantly, make the connections between the information that’s available on the Web—the power of Linked Data.

PwC: Is there a sense that you’re trying to do something now that you would not have tried to do before?

FC: Four things are going on here. First, the Semantic Web lets you be more expressive in the business logic, to add more contextual meaning. Second, it lets you be more flexible, so that you don’t have to have everything fully specified before you start building. Then, third, it allows you to do inferencing, so that you can perform discovery on the basis of rules and axioms. Fourth, it improves the interoperability of systems, which allows you to share across the spectrum of the business ecosystem. With all of these, the Semantic Web becomes a very significant piece of technology so that we can probably solve some of the problems we couldn’t solve before. One could consider these enhanced capabilities [from Semantic Web technology] as a “souped up” BI [business intelligence].

PwC: You mentioned the standardized metadata that’s been in development for a long time, the ISO 15926 standard. Are you making use of initiatives that predate your interest in the Semantic Web within an ontology context and mapping one ontology to another to provide that linkage?

FC: Yes, definitely. In my use case [see], I spell out what we call ontology-based information integration. Using ontologies to structure data actually increases flexibility, because you don’t have to have everything to begin with. You can model only what you need to for starters, enhance the ontology later, and then merge them together.

PwC: It’s difficult enough to get people to think as abstractly as metadata. And then going from metadata to ontologies is another conceptual challenge. Have you found it necessary to start with a training process, where you teach people about ontologies?

FC: This is a good question, because I’m involved in the master data management initiative at Chevron, too. We want to have shared definitions of concepts so that there is no ambiguity. And people need to agree with that shared definition. So in your own department you want to be in consensus with what the enterprise definition is for, let’s say, people or contractors or whatever.

It’s part of another project, what we call our conceptual information model. It looks at everything going on in Chevron, and we have developed 18 or 19 information classes. And then within these classes there are some 200 high-level categories that probably can describe all of Chevron’s business activities. So that is ongoing, but the semantic part is what actually provides the mapping of how one of these concepts or terms relates to another.

PwC: How does the conceptual information model relate to the master data management effort?

FC: It was a precursor to the master data management initiative, all part of what we call our enterprise information architecture. The key concern is shareability. Some of these concepts need to be shared among different departments, so we need to harmonize the conceptual information model across departments. That is the other approach. But in an ontology, we aren’t attempting to develop an all-inclusive, comprehensive information model of Chevron. We have more of a pragmatic approach in ontology building. We focus on building out what is needed for a specific solution, and we rely on the flexibility of ontologies that let us merge and stitch linked ontologies together for information integration.

PwC: Can you give us an early example where you had a limited scope and a specific problem, and you used the ontology approach to generate positive results and a deliverable benefit?

FC: In the case study and in the workshop, we looked at our UNIX file systems, which hold much of our technical data. The original idea was to think of it as merging—not just the UNIX file system, but also the Windows environment—so when you do a search, you’ll be searching both at the same time.

We started with the UNIX part. We scraped the directory structure, and then we were able to extract metadata from the directory that was from the file path, because of the way the users name the path.

PwC: A folder system.

FC: Right. Folder systems or what we call file plan in the sense that they contain certain metadata. And together with the file path, we know who created that, and so we have a kind of people ontology. And then we have the project they’re working on and metadata information about when the file was created, when it was modified, and who did it. So we gathered all this information and put in place a system that described what a file was created for, who worked on it, for what project, at what time. We were then able to put in queries and ask who is working on certain projects at a certain time. This information was not in a database system, for example, but is implicit in the file metadata.

PwC: So are you looking at specific approaches to understanding the unstructured content itself, the information inside the files?

FC: Looking inside the files at the unstructured content is something we’ve talked about doing but we haven’t gotten there yet. There are an awful lot of different kinds of files in these repositories, and many of them are binary files that don’t contain easily recognizable text.

“Consider AI as raisins in a loaf of bread to make the loaf of bread more flavorful.”

Finding widely applicable ways of getting information out of the contents may be a considerable challenge. That’s why we started where we did. We do, however, also have a lot of spreadsheets, and we’re looking at ways to link information in the spreadsheets to ontologies that link and organize the information.

PwC: It seems like you’d be able to take the spreadsheets and, in conjunction with a more structured source, make sense of those sheets.

FC: That’s not easy, however, because spreadsheets can be very diverse. One way is to build ontologies from those spreadsheets. With the appropriate tool, we could import them, figure out an ontology for them, and then externalize that ontology to link one spreadsheet to the others. Once we’re able to take these spreadsheets and externalize the metadata and so on, then we’d be able to integrate them directly into workflows.

PwC: This sounds like one of those situations where you have a tool that’s helping you with, say, spreadsheets, and it’s not going to be 100 percent correct every time. It’s going to be a bit messy. So how do you approach that? Is there a feedback loop or a Web 2.0 quality to it, something that makes it a self-correcting process?

FC: We haven’t gotten that far yet, but I assume that is our next step. The University of Texas has implemented a decision support system in spreadsheets. We also run decision support systems on spreadsheets, but The University of Texas implemented it in an ontology-based semantic technology, and it’s very innovative. But we are getting there. We’re taking baby steps and are always looking at how we can use this technology.

PwC: How far is this effort from the actual end users, such as an engineer or an earth scientist? If you were to say “ontology” to someone in that role at Chevron, do their eyes roll up into the back of their heads? Are they familiar with this effort, or is this a back-room effort?

FC: Well, I would say it was that way two years ago. They would call it the O word. “You’re bringing out the O word again.” Everyone said that. We have since made people aware of what this is. A major goal of the W3C workshop was to build awareness not just for Chevron but throughout the industry. That’s part of the objective: to get not only Chevron comfortable with the Semantic Web, but also BP, Total, Shell, and so forth. Within the Chevron community, there is more and more interest in it, especially on the information architecture side of things. People are interested in how ontologies can help and what an ontological approach brings that traditional approaches don’t, such as EII [enterprise information integration]. When we say data integration, they respond, “Aren’t we already doing it with EII? Why do we need this?” And so we are having this dialogue.

“Two years ago, they would call it the O word.”

PwC: And what is your answer in that situation? What do you say when somebody says, “Aren’t we already doing EII?”

FC: In EII, you get only what is already there, but working with the Semantic Web, we call it the open world reasoning instead of the closed world. In databases, in EII, you’re connecting the data that you have. But with the Semantic Web, you’re connecting to much more information. Instead of saying, “If it’s not in the database, it’s false,” we only say that we don’t know the answer. Linking to new information is far easier. We
are better able to integrate it, leading to a better business decision.

PwC: We talked to Tom Scott at BBC Earth. His primary focus is on leveraging information about music and artists that is already on the Semantic Web. He wouldn’t describe himself as an IT person. He’s more of a product manager. Is that something you see in some of the people you work with?

FC: Definitely. For example, some of the people we work with are geoscientists. Among them there’s already a big effort called GEON [Geosciences Network]. They are building ontologies for the different earth structures and trying to use them within our IT environment.

PwC: It sounds like in your case you have the best of both worlds. You have the central function thinking about leveraging semantic technologies, and then you have people in the earth sciences domain who are living in that domain of knowledge all the time.

FC: Yes, the best thing about them is that they know their pain point—what they want done—and they are constantly thinking about what can help them to solve the problem. In another sense, the earth science SMEs [subject matter experts] know that they need to be able to describe the world in ways that can be shared with other people and be understandable by machines. So they have this need, and they look into this, and then they call us and say, “How can we work with you on this?”

PwC: Do you work with them mostly graphically, using friend-of-the-friend bubble charts and things like that? Is that how you typically work with these domain folks?

FC: The tools that support Semantic Web initiatives are getting more and more sophisticated. We have SMEs who want to get a copy of the ontology modeling tool. They want a copy of that so they can work with it.

PwC: These are business users who want to get their own ontology modeling tool? How do they even know about it?

FC: Well, we [IT] do our modeling with it. We showed them an ontology and validated it with them, and then they said, “Whoa, I haven’t—this is good, good stuff,” so they wanted to be involved with it too and to use it.

In the Chevron world, there are a lot of engineers. They are part of the Energy Technology Company [ETC], and we are part of ITC, the Information Technology Company. ETC has some more people who have domain knowledge and also want to experiment with the new tools. As soon as we show them, they want it. Before, they were looking at another knowledge modeling tool, but the ontology tool is really capable of making inferences, so they want that, and now we are getting more and more licenses and using it.

PwC: Do you sense some danger that we could have a lot of enthusiasm here and end up with a lot of non-compatible ontologies? Are we going to enter a period where there will need to be some sort of master data model, a master ontology model effort?

FC: We already defined some standards to address that. We have a URI [Uniform Resource Identifier] standard for how you name ontologies, and it’s referenceable so that you can go into that URI and retrieve the ontology. We tried to make that shareable, and we are also starting a community type of space.

PwC: Is it discoverable somehow? If some employees somewhere in Saudi Arabia decide they need to get started with an ontology, would there be an easy way for them to find other ontologies that already exist?

FC: We’re standardizing on a [Microsoft] SharePoint platform, so we have a SharePoint site on information discovery that has Semantic Web or entity extraction for these unstructured texts and different analytics.
We have publicized that through communications,
and we have people posting their work there. We try
to make use of the collective intelligence kind of notion, like Wikipedia—have people come to it and have a discussion.

PwC: So you’re taking the Semantic Web out of this innovation research program within Chevron and moving it into the delivery side of the organization?

FC: We have a number of projects within the innovation, strategic research, proof of concept, pilot, to technology delivery continuum.

PwC: Is there a specific part of your business ecosystem where you are deploying it first?

FC: Yes. We are partnering with USC and have formed an organization called CiSoft [Center for Interactive Smart Oilfield Technologies, a joint venture of the University of Southern California and the Chevron Center of Excellence for Research and Academic Training]. We have created an application called Integrated Asset Management that uses Semantic Web technology to help with tasks associated with reservoir management. The end users don’t see anything that they would recognize as the Semantic Web, but under the covers it is enabling the integration of information about these assets.

PwC: You’re pretty confident that your initial Semantic Web applications are going to be in production and successful and part of the fabric of Chevron?

FC: Just because something performs well in a proof of concept, or pilot, doesn’t mean that it’s going to do well in production, right? We’re looking at scalability. That’s one of the big questions. When you’re dealing with billions of RDF triples, you wonder if it is going to give you the response time you need. We’re learning how to address this issue.