Importing Ontologies in P41

From Protege Wiki
Revision as of 16:53, December 3, 2012 by Tredmond (talk | contribs) (Protege and XML Catalogs)

Jump to: navigation, search

Importing Ontologies in Protege 4.1

In this section, I will walk you through adding an import to an ontology. The purpose of this page is to illustrate OWL 2.0 imports and explain the design of the Protege 4.1 import mechanism. Note that there is also another How_Owl_2.0_Imports_Work page which describes OWL 2.0 imports which as not yet been integrated with this page. First I will introduce some terminology.

Definitions

An import declaration is the owl statement that is used to import an ontology. Here is a simplification of the ontology declaration from the pizza ontology:

    <owl:Ontology rdf:about="http://www.co-ode.org/ontologies/pizza/2005/10/18/pizza.owl">
        <owl:imports rdf:resource="http://protege.stanford.edu/plugins/owl/protege"/>
    </owl:Ontology>

This declaration states that the pizza.owl ontology imports all the assertions found in the protege ontology. The OWL 2 language specification has the following to say about the imported declaration (http://protege.stanford.edu/plugins/owl/protege in this case):

these IRIs identify the ontology documents of the directly imported ontologies as specified in Section 3.2.

And in section 3.2 of the OWL 2 language specification we have the following expansion of this definition:

  • Each ontology document can be accessed via an IRI by means of an appropriate protocol.
  • Each ontology document can be converted in some well-defined way into an ontology (i.e., into an instance of the Ontology UML class from the structural specification).

While this description leaves much of the meaning of exactly what is done with an import declaration to the implementation, the reference to an appropriate protocol makes it fairly clear that we are talking about finding the ontology to be imported by chasing down an ontology location. So in the case of the pizza ontology, the protege import should be treated as a URL and the ontology to be imported is found using the http protocol.

The physical location of an ontology is the location where the source for the ontology is found and loaded from. It may appear from the above definition of the import declaration that the physical location of an imported file should simply be the location indicated by the import declaration. But often ontology developers want to have imports redirected to a physical location on their local machine. The OWL 2 specification makes allowances for this:

OWL 2 tools will often need to implement functionality such as caching or off-line processing, where ontology documents may be stored at addresses different from the ones dictated by their ontology IRIs and version IRIs. OWL 2 tools MAY implement a redirection mechanism: when a tool is used to access an ontology document at IRI I, the tool MAY redirect I to a different IRI DI and access the ontology document via DI instead.

In a separate email discussion, it appears that the working group recommends the use of xml catalogs as a redirection mechanism and the xml catalog specification is at the core of the Protege 4.1 import redirection mechanism.

Finally every ontology has an ontology name and ontology version. These are described here. There is no direct connection between import declarations and the name of the ontology being imported. However, the owl 2 specifications do indicate that physical locations for an ontology can be determined by the name or version of the ontology:

  1. If O contains an ontology IRI OI but no version IRI, then the ontology document of O SHOULD be accessible via the IRI OI.
  2. If O contains an ontology IRI OI and a version IRI VI, then the ontology document of O SHOULD be accessible via the IRI VI; furthermore, if O is the current version of the ontology series with the IRI OI, then the ontology document of O SHOULD also be accessible via the IRI OI.

These statements suggest the possibility that the import declaration can use the imported ontology name or version to do the import. I personally believe that this is the best way to do imports as it makes it easier to share ontologies when the user does not have access to the ontologies home network (e.g. the user is offline).

Importing an ontology with Protege 4.1+

Here we will describe the steps involved in adding an import from the web. We assume that you have some ontology already loaded in Protege 4.1. Go to the Active Ontologies tab and click on the plus sign beside the Direct Imports text. This will pull up the following dialog box.

Protege41ImportWizardChooseType.png

We will consider the third option in this dialog first because it is by far the simplest. That is not to say that this option is the most commonly used option; the fourth option is also very common.

Importing an ontology from its web location

We will be choosing the third option in this dialog: "Import an ontology contained in a document located on the web". Select this option continue and put the http://protege.cim3.net/file/pub/ontologies/travel/travel.owl URI in the URI box.

Protege41ImportWizardURL.png

Now when you click "continue" Protege 4.1 will calculate the name and version of the ontology at the http://protege.cim3.net/file/pub/ontologies/travel/travel.owl location. The purpose of this step is two-fold. First it provides some reasonable verification that a real owl ontology can be found at the location provided. Second it will be used to provide some possibilities of the way in which the desired ontology can be imported. The calculation of the ontology name can take a bit of time so you may see a transitory import verification page come up while the ontology name and version is calculated. When the ontology name calculation is complete a new page will come up which will allow you to specify the import declaration.

Protege41ImportWizardChooseDeclaration.png

The first of these two options is suggesting that the travel.owl ontology be imported as follows:

    <owl:Ontology rdf:about="http://www.co-ode.org/ontologies/pizza/2005/10/18/pizza.owl">
        <owl:imports rdf:resource="http://www.owl-ontologies.com/travel.owl"/>
        <owl:imports rdf:resource="http://protege.stanford.edu/plugins/owl/protege"/>
    </owl:Ontology>

This should be a valid way to import travel.owl because the name of the ontology should be a location where the travel.owl ontology can be found.

The second of these two options is suggesting that the travel.owl ontology be imported as follows:

    <owl:Ontology rdf:about="http://www.co-ode.org/ontologies/pizza/2005/10/18/pizza.owl">
        <owl:imports rdf:resource="http://protege.cim3.net/file/pub/ontologies/travel/travel.owl"/>
        <owl:imports rdf:resource="http://protege.stanford.edu/plugins/owl/protege"/>
    </owl:Ontology>

In this case we know that the travel.owl ontology will be found at this location because this is where we are loading the travel.owl ontology. I believe that it is usually better to use the ontology name for the import.

But there are some reasons why the second version might be preferred. For instance if the import is being added to an ontology being used by developers, one might want to point to the svn or webdav location where the ontology is being modified rather than the location where the ontology will ultimately be placed. In fact in this case, the ontology name does not work as an import because in fact the ontology cannot be found at that location. So we must use the second option. I select the second option and click on continue. This brings me to the final page of the wizard.

Protege41ImportWizardFinal2.png

This page tells me that my import

Click Finish and the import will be added.

In summary, to import an ontology we went through the following steps:

  1. we chose a physical location for the ontology that we wanted to import,
  2. we determined the name and version of the ontology at that location,
  3. optionally, we selected how we wanted to declare the import in the ontology where the import was being added and finally
  4. we look over the results and commit the import or cancel the dialog.

All the import wizards follow this pattern.

Protege and XML Catalogs

At some point after the OWL 2 specification became final, Protege 4.1 adopted the XML Catalog standard as a way of recording how it redirects imports while loading an ontology. The goals of the Protege 4.1 XML Catalog implementation is as follows:

  • The implementation must have transparent compatibility with Protege 4.0 import mechanism that seems to have been well liked.
  • After opening an OWL file from a filesystem, there will be an XML catalog file that can be used by any xml catalog aware application to replicate the imports that were performed by Protege. In particular, the XML Catalog library can be used to enable the OWL api to use XML Catalogs.
  • There must be a clear path by which the the Protege 4.0 import mechanism can be enhanced. For example Protege 4.2 allows OWL ontologies in a directory, A, to import ontologies in a directory, B, where B is not a direct or indirect child of A. Also Protege 4.2 supports import-by-name. Neither of these features were supported in Protege 4.0.
  • The implementation must provide a means by which a user can over-ride redirects calculated by Protege and provide their own declaration of how import declarations should be redirected. For example, as of this reading, the obi ontology does exactly this with their catalog.
  • The implementation must provide a means by which a user can prevent Protege from doing any automatic redirection (by emptying the catalog).

We will cover each of these points in turn. First, when a user opens an ontology on his local file system, the Protege 4.0 imports redirection capability would first perform a scan of the directory containing the ontology. The result of this scan would be an association of ontologies with names by which they could be imported. For each import declaration found during the parse of the ontology, the Protege 4.0 implementation would try to see if the import declaration matched any of the names associated with the files in the local file system. If a match was found then the Protege 4.0 mechanism would load the file on the local disk as the import.

Since the Protege 4.0 mechanism would not write any configuration data to any local configuration file this technique had some disadvantages. In particular, it was not possible in Protege 4.0 to change or disable how the import redirection occurred for the ontologies in a particular directory. Thus for instance some users would have directories containing several ontologies with the same name. In this situation, the user would have to manually configure the import redirection at each load. Other projects, e.g., obi, would put imported ontologies in a directory that was not a child directory. As such the Protege 4.0 mechanism would never find the on-disk location for the imports and would always go to the web.

For these and other reasons, Protege 4.1+ uses a configuration file called catalog-v001.xml, to describe how imports are redirected. We chose the XML Catalog format for this file because it is an industry standard format for specifying URL redirection.

To mimic the Protege 4.0 mechanism, we defined a type of plugin, a repository plugin, that is allowed to update certain constrained portions of the catalog. Just before loading an ontology from the disk, Protege 4.1+ will give each repository a chance to update a field in the appropriate XML catalog. Thus for instance, a typical XML catalog generated by Protege will contain a group entry that looks something like the following:

     <group id="Folder Repository, directory=, recursive=true, Auto-Update=true, version=2" prefer="public" xml:base="">
        <uri id="Automatically generated entry, Timestamp=1330383331040" name="...

When Protege 4.1+ loads an ontology in the same directory as this catalog, the "Folder Group Manager" plugin will recognize that it needs to update this group entry because

  1. it recognizes that the first part of the name of the group "Folder Repository" indicates that it is the creator and manager of this particular group entry.
  2. it sees that the Auto-Update flag is set to true, allowing this group entry to be modified.

If either of these conditions fail, then the "Folder Group Manager" plugin will make no changes. If the "Folder Group Manager" plugin does update this group entry, then it does so by

  1. scanning the disk to associate files with the names by which they could be imported,
  2. writing each of these associations to a uri-entry underneath the group entry indicating that a particular import should be redirected to a particular ontology file.

After all the repository plugins had their say, Protege would load the ontology and use the updated XML catalog to resolve any imports.

Under Construction

Things to cover - motivation, catalog-v001.xml, when Protege modifies the catalog

Import By Name