Difference between revisions of "How Owl Imports Work"

From Protege Wiki
Jump to: navigation, search
(Why Not Import By Location?)
(Why Not Import By Location?)
Line 114: Line 114:
 
The one case where import by location shines and import by name does very badly is the following:
 
The one case where import by location shines and import by name does very badly is the following:
  
# The internet is trusted, available and reliable, ontologies are never relocated and all the ontologies of interest are on the internet.<br><br> With import by location, the import will always be found.  With import by name, a person reading an ontology off the web may not be able to determine where to find the imported ontology.<br><br>I will lump the other use cases together but they may have important differences.
+
# The internet is trusted, available and reliable, ontologies are never relocated and all the ontologies of interest are on the internet.<br><br> With import by location, the import will always be found.  With import by name, a person reading an ontology off the web may not be able to determine where to find the imported ontology.<br>I will lump the other use cases together but they may have important differences.
 
# I am commuting home from work with no internet access and unzip a collection of owl files.
 
# I am commuting home from work with no internet access and unzip a collection of owl files.
 
# I am developing an application which may not have access to the internet and/or may not be willing to trust the internet even if it had access.
 
# I am developing an application which may not have access to the internet and/or may not be willing to trust the internet even if it had access.

Revision as of 10:43, April 28, 2008

OWL Imports

First it must be understood that the semantics of imports in OWL is a subject of some controversy. In the OWL 1.1 specification there is effort in progress to clear up some of this confusion. Until this is settled we will use the definition of the semantics of imports given in the semantics document of the w3.org specs:

Aside from this local meaning, an owl:imports annotation also imports the contents of another OWL ontology into the current ontology. The imported ontology is the one, if any, that has as name the argument of the imports construct. (This treatment of imports is divorced from Web issues. The intended use of names for OWL ontologies is to make the name be the location of the ontology on the Web, but this is outside of this formal treatment.)

In this note we will describe what this innocent little paragraph is saying and what it means to Protege.

Names of Ontologies

All owl ontologies have a name. The w3.org document describing this naming process can be found at this link. Unfortunately the way that this name is calculated (at least for the RDF/XML OWL syntax) is not well specified. Even so - in most cases it can be calculated in an unambiguous way. In most RDF/XML ontologies there is a single RDF resource that is declared to be of type owl:Ontology. So for instance in the pizza owl ontology, the declaration in question is the following:

 <owl:Ontology rdf:about="">
   <protege:defaultLanguage rdf:datatype="http://www.w3.org/2001/XMLSchema#string"
   >en</protege:defaultLanguage>
   <owl:versionInfo rdf:datatype="http://www.w3.org/2001/XMLSchema#string"
   >version 1.3</owl:versionInfo>
   <rdfs:comment xml:lang="en">An example ontology that contains all constructs required for the various versions of the Pizza Tutorial run by Manchester University (see http://www.co-ode.org/resources/tutorials/)</rdfs:comment>
   <owl:imports rdf:resource="http://protege.stanford.edu/plugins/owl/protege"/>
 </owl:Ontology>

The first line of this declaration defines an RDF resource of type owl:Ontology. The name of this resource is given by the rdf:about statement. In this case the rdf:about string is empty which means that the xml base is used. The xml:base declaration can be found near the top of the pizza ontology in the namespace declarations:

<rdf:RDF
   xmlns:protege="http://protege.stanford.edu/plugins/owl/protege#"
   xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
   xmlns:xsd="http://www.w3.org/2001/XMLSchema#"
   xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
   xmlns:owl="http://www.w3.org/2002/07/owl#"
   xmlns="http://www.co-ode.org/ontologies/pizza/2005/10/18/pizza.owl#"
   xmlns:daml="http://www.daml.org/2001/03/daml+oil#"
   xmlns:dc="http://purl.org/dc/elements/1.1/"
 xml:base="http://www.co-ode.org/ontologies/pizza/2005/10/18/pizza.owl">

No other declarations of an owl:Ontology resource occur in the pizza ontology so the name of this resource (e.g. http://www.co-ode.org/ontologies/pizza/2005/10/18/pizza.owl) is the name of the pizza ontology. It is relevant to this discussion that the name of the pizza ontology is working URL and clicking on the link pulls up the pizza ontology. In summary:

  • the pizza ontology contains exactly one owl:Ontology declaration,
  • this declaration is named by rdf:about="",
  • the xm:base is declared and
  • the xml:base is a working URL that points to the pizza ontology.

This situation is the simplest case and in this case things tend to work very smoothly.

However none of the assumptions above are guaranteed to hold. Sometimes there are no owl:Ontology declarations in the owl ontology. In this case the document base is used as the name of the ontology. The document base is given by xml:base statement or the the physical location that was used to retrieve the ontology if no xml:base declaration is present. If this is still ambiguous, I am not sure what happens.

In addition sometimes the rdf:about statement does not use the empty string. This allows one to give the ontology a name other than the xml:base. Finally the name of the ontology may only be a URI and may not resolve to the right ontology as a URL. Each of these cases adds some difficulty to ontology developers and to some degree motivate the writing of this document.

There is another case that has been troublesome in the past. Occasionally in some cases there are more than one owl:Ontology declaration. In this case tools occasionally get confused and give different results. Protege works by taking the first declaration of an owl:Ontology individual to be the name of the ontology. This would be fine except that the notion of first declaration in RDF/XML is not well defined.

Import By Name

If we look at the owl:Ontology header of the pizza ontology, we see that the declared owl:Ontology resource is related to another resource by the owl:imports property:

 <owl:Ontology rdf:about="">
   <protege:defaultLanguage rdf:datatype="http://www.w3.org/2001/XMLSchema#string"
   >en</protege:defaultLanguage>
   <owl:versionInfo rdf:datatype="http://www.w3.org/2001/XMLSchema#string"
   >version 1.3</owl:versionInfo>
   <rdfs:comment xml:lang="en">An example ontology that contains all constructs required for the various versions of the Pizza Tutorial run by Manchester University (see http://www.co-ode.org/resources/tutorials/)</rdfs:comment>
   <owl:imports rdf:resource="http://protege.stanford.edu/plugins/owl/protege"/>
 </owl:Ontology>

As triples the owl:imports statement would look like this:

<http://www.co-ode.org/ontologies/pizza/2005/10/18/pizza.owl> owl:imports <http://protege.stanford.edu/plugins/owl/protege>.

This statement tells tools such as Protege or reasoners that the pizza ontology should import an ontology whose name is

http://protege.stanford.edu/plugins/owl/protege.

So now the problem is "how do we find an ontology with this name?". The easy answer would be to simply use the name of the imported ontology as a URL and fetch the imported ontology in this way. Most of the time this works (e.g. this example) but in general the retrieval is not that simple.

Suppose that you are on a train commuting home from work with no internet connection and you decide to edit the pizza ontology. If Protege uses the URL to try to retrieve the ontology it will fail. In this case, this will not be that inconvenient. The user can still edit. But with more complex ontologies with more critical imports (e.g. the birnlex ontology) not finding the imports may make working on the ontology impossible.

In this case, both Protege 3 and Protege 4 (and most other owl ontology tools) have a mechanism by which a user can tell the tool where to search for an ontology with the right name. In Protege 3 the spaces of ontologies to search are called ontology repositories. In Protege 4 they are called ontology libraries. A typical ontology repository or library will consist of all the ontologies in a given directory on the disk. When Protege is trying to resolve an import, it will first search in all the defined ontology repositories for an ontology whose name matches the desired imported name. If it doesn't find any matches, then it tries the ontology name as a URL. If this fails then the import will fail. This is not generally a fatal error, it just means that Protege does not have access to the assertions contained in that import.

The repository mechanism for Protege 3 is described at link. The Protege 4 ontology library documentation will be coming soon.

What Can Go Wrong

There are two main things that can go wrong. First Protege may fail to find the imported ontology and second Protege may follow the above algorithm but end up retrieving an ontology with the wrong name.

Ontology Not Found

The ontology

http://protege.stanford.edu/junitOntologies/testset/NonExistentImport.owl

has an import statement that points to an ontology that does not exist. In this case both Protege 3 and Protege 4 will generate a dialog with the user to find the location of the imported file. The user has the choice of either ignoring the import or pointing Protege to a location where the imported ontology can be found.

In Protege 3 the dialog looks like this: HowOwlImportsWorkProtege3NotFound.png

In Protege 4 the dialog looks like this: HowOwlImportsWorkProtege4NotFound.png

Ontology Found Has The Wrong Name

The ontology at the location

http://protege.stanford.edu/junitOntologies/testset/uglyImport.owl

has an import statement that imports an ontology with the wrong name. More specifically the ontology header is as follows:

 <owl:Ontology rdf:about="">
   <owl:imports rdf:resource="http://protege.stanford.edu/junitOntologies/testset/travel.owl"/>
 </owl:Ontology>

However if you follow the link http://protege.stanford.edu/junitOntologies/testset/travel.owl it turns out that the name of the ontology in this location is http://www.owl-ontologies.com/travel.owl.

So in this case, if Protege follows the specification of imports described above then it would not import any statements from the ontology found at the location http://protege.stanford.edu/junitOntologies/testset/travel.owl. The ontology found at that location is not the ontology requested by the import. So the import is not found. Strictly speaking, this is the correct behavior of the Protege tool.

However this behavior is almost never what the user wants. So in both Protege 3 and Protege 4 we adopted a compromise position. We import the assertions from the ontology

http://www.owl-ontologies.com/travel.owl

that we found at the location http://protege.stanford.edu/junitOntologies/testset/travel.owl and we leave a visual cue in the imports tree for the ontology that the import is broken.

In Protege 4 the visual cue is the little green triangle to the right in the diagram below: HowOwlmportsWorkProtege4Broken.png

A nice feature of Protege 4 is that clicking on the green triangle will bring up a dialog that asks if you want to fix the import: HowOwlmportsWorkProtege4Fix.png

Why Not Import By Location?

So you want to get controversial eh? This is where there are different opinions.

Import by location is a model where the name of an ontology is irrelevant to the import statement. The import statement simply gives a physical address to look for the ontology. Several people have pushed that this is the right model.

The one case where import by location shines and import by name does very badly is the following:

  1. The internet is trusted, available and reliable, ontologies are never relocated and all the ontologies of interest are on the internet.

    With import by location, the import will always be found. With import by name, a person reading an ontology off the web may not be able to determine where to find the imported ontology.
    I will lump the other use cases together but they may have important differences.
  2. I am commuting home from work with no internet access and unzip a collection of owl files.
  3. I am developing an application which may not have access to the internet and/or may not be willing to trust the internet even if it had access.
  4. I have access to the internet but I want to edit some (must be more than one) ontology that I download off the web.
  5. Web servers, projects and organizations come and go and ontologies are relocated.

In these cases, to varying degrees, import by name works very well and this is why I think it is the right choice. (Cases 2 and 3 are close to my heart.) Consider use case 2 because in some sense it is an extreme. In this case - with import by name - I simply plop the owl files on my disk and my tool can easily determine which ontologies import which. It just needs to parse the import statements and the ontology declarations from the files in question.

Import by location fares much worse in this case. My tool has no way of figuring out which ontology imports which - it must be told. If the zip file only contains owl files then it is a human who must figure out the imports. Also import trees can be pretty complicated as they have been in several recent examples. This is aggravated when - as in one case - the ontologies in question use different methods of importing the same ontology. This means that my zip file must include a file that records *all* the different ways in which the owl files are downloaded. As different tools will use different versions of this file, I will need to convert the file to all the different formats. Seems very awkward.