Difference between revisions of "How Owl Imports Work"
(→Import By Name) |
Lee W. Lacy (talk | contribs) m (fixed typo xm:base to xml:base LWL) |
||
(20 intermediate revisions by one other user not shown) | |||
Line 1: | Line 1: | ||
− | = OWL Imports = | + | = OWL 1.0 Imports = |
+ | |||
+ | Before starting, I should point out that the way [[How_Owl_2.0_Imports_Work|imports work in OWL 2.0]] is different than the way that imports work in OWL 1.0. The short story is that the OWL 2.0 group has chosen import by location. The OWL 2.0 scheme has not been fully implemented by either Protege 3 nor Protege 4, though both editors will load such import statements. This new scheme will be fully supported by future versions of the OWL API and Protege 4. | ||
First it must be understood that the semantics of imports in OWL is a subject of some controversy. In the OWL 1.1 specification there is | First it must be understood that the semantics of imports in OWL is a subject of some controversy. In the OWL 1.1 specification there is | ||
Line 35: | Line 37: | ||
* the pizza ontology contains exactly one owl:Ontology declaration, | * the pizza ontology contains exactly one owl:Ontology declaration, | ||
* this declaration is named by rdf:about="", | * this declaration is named by rdf:about="", | ||
− | * the | + | * the xml:base is declared and |
* the xml:base is a working URL that points to the pizza ontology. | * the xml:base is a working URL that points to the pizza ontology. | ||
This situation is the simplest case and in this case things tend to work very smoothly. | This situation is the simplest case and in this case things tend to work very smoothly. | ||
Line 88: | Line 90: | ||
The ontology at the location | The ontology at the location | ||
http://protege.stanford.edu/junitOntologies/testset/uglyImport.owl | http://protege.stanford.edu/junitOntologies/testset/uglyImport.owl | ||
− | + | has an import statement that imports an ontology with the wrong name. More specifically the ontology header is as follows: | |
<owl:Ontology rdf:about=""> | <owl:Ontology rdf:about=""> | ||
<owl:imports rdf:resource="http://protege.stanford.edu/junitOntologies/testset/travel.owl"/> | <owl:imports rdf:resource="http://protege.stanford.edu/junitOntologies/testset/travel.owl"/> | ||
Line 101: | Line 103: | ||
In Protege 4 the visual cue is the little green triangle to the right in the diagram below: | In Protege 4 the visual cue is the little green triangle to the right in the diagram below: | ||
+ | |||
[[Image:HowOwlmportsWorkProtege4Broken.png]] | [[Image:HowOwlmportsWorkProtege4Broken.png]] | ||
A nice feature of Protege 4 is that clicking on the green triangle will bring up a dialog that asks if you want to fix the import: | A nice feature of Protege 4 is that clicking on the green triangle will bring up a dialog that asks if you want to fix the import: | ||
+ | |||
[[Image:HowOwlmportsWorkProtege4Fix.png]] | [[Image:HowOwlmportsWorkProtege4Fix.png]] | ||
+ | |||
+ | The advantage of fixing the import is that it will be easier to edit the ontology offline, tools will work better and the ontology will be suitable for use with an application. The disadvantage is that users and tools will either have to know or be told the correct location for the ontology. | ||
+ | |||
+ | == Why Not Import By Location? == | ||
+ | |||
+ | This is where there are different opinions. | ||
+ | |||
+ | Import by location is a model where the name of an ontology is irrelevant to the import statement. The import statement simply gives a physical address to look for the ontology. Several people have argued that this is the right model. Admittedly the w3c documents do not have satisfactory solutions to the real world problems involving imports. Hopefully the new OWL 1.1 specification will fix the problem and we can put it to rest. | ||
+ | |||
+ | The one case where import by location shines and import by name does very badly is the following: | ||
+ | |||
+ | * The internet is trusted, available and reliable, ontologies are never relocated, people choose to give their ontologies names that do not correspond to their physical location and all the ontologies of interest are on the internet. | ||
+ | |||
+ | With import by location, the import will always be found. With import by name, a person reading an ontology off the web may not be able to determine where to find the imported ontology. While there are several conditions that need to be met for import by location to work, it must be noted that this is not an uncommon situation. | ||
+ | |||
+ | Note that usually an ontology has a name that does not correspond to its physical location only when the ontology has been relocated. When an ontology is relocated, the import by location scheme would require all ontologies making the import be modified. Once the new location is updated in all importing ontologes we would be back in this use case where import by location works. | ||
+ | |||
+ | I will lump the other use cases together but they may have important differences. | ||
+ | # I am commuting home from work with no internet access and unzip a collection of owl files. | ||
+ | # I am developing an application which may not have access to the internet and/or may not be willing to trust the internet even if it had access. | ||
+ | #I have access to the internet but I want to edit some (must be more than one) ontology that I download off the web. | ||
+ | # Web servers, projects and organizations come and go and ontologies are relocated. | ||
+ | |||
+ | In these cases, to varying degrees, import by name works very well and this is why I think it is the right choice. (Cases 1 and 2 are close to my heart.) Consider use case 1 because in some sense it is an extreme. In this case - with import by name - I simply plop the owl files on my disk and my tool can easily determine which ontologies import which. It just needs to parse the import statements and the ontology declarations from the files in question. | ||
+ | |||
+ | Import by location fares much worse in this case. Since IO operations to the internet fail, my tool has no way of figuring out which ontology imports which - it must be told. If the zip file only contains owl files then it is a human who must figure out the imports. Also import trees can be pretty complicated as they have been in several recent examples. This is aggravated when - as in one case - the ontologies in question use different methods of importing the same ontology. This means that my zip file must include a file that records *all* the different ways in which the owl files are downloaded. As different tools will use different versions of this file, I will need to convert the file to all the different formats. |
Latest revision as of 10:44, July 25, 2011
Contents
OWL 1.0 Imports
Before starting, I should point out that the way imports work in OWL 2.0 is different than the way that imports work in OWL 1.0. The short story is that the OWL 2.0 group has chosen import by location. The OWL 2.0 scheme has not been fully implemented by either Protege 3 nor Protege 4, though both editors will load such import statements. This new scheme will be fully supported by future versions of the OWL API and Protege 4.
First it must be understood that the semantics of imports in OWL is a subject of some controversy. In the OWL 1.1 specification there is effort in progress to clear up some of this confusion. Until this is settled we will use the definition of the semantics of imports given in the semantics document of the w3.org specs:
- Aside from this local meaning, an owl:imports annotation also imports the contents of another OWL ontology into the current ontology. The imported ontology is the one, if any, that has as name the argument of the imports construct. (This treatment of imports is divorced from Web issues. The intended use of names for OWL ontologies is to make the name be the location of the ontology on the Web, but this is outside of this formal treatment.)
In this note we will describe what this innocent little paragraph is saying and what it means to Protege.
Names of Ontologies
All owl ontologies have a name. The w3.org document describing this naming process can be found at this link. Unfortunately the way that this name is calculated (at least for the RDF/XML OWL syntax) is not well specified. Even so - in most cases it can be calculated in an unambiguous way. In most RDF/XML ontologies there is a single RDF resource that is declared to be of type owl:Ontology. So for instance in the pizza owl ontology, the declaration in question is the following:
<owl:Ontology rdf:about=""> <protege:defaultLanguage rdf:datatype="http://www.w3.org/2001/XMLSchema#string" >en</protege:defaultLanguage> <owl:versionInfo rdf:datatype="http://www.w3.org/2001/XMLSchema#string" >version 1.3</owl:versionInfo> <rdfs:comment xml:lang="en">An example ontology that contains all constructs required for the various versions of the Pizza Tutorial run by Manchester University (see http://www.co-ode.org/resources/tutorials/)</rdfs:comment> <owl:imports rdf:resource="http://protege.stanford.edu/plugins/owl/protege"/> </owl:Ontology>
The first line of this declaration defines an RDF resource of type owl:Ontology. The name of this resource is given by the rdf:about statement. In this case the rdf:about string is empty which means that the xml base is used. The xml:base declaration can be found near the top of the pizza ontology in the namespace declarations:
<rdf:RDF xmlns:protege="http://protege.stanford.edu/plugins/owl/protege#" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:xsd="http://www.w3.org/2001/XMLSchema#" xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" xmlns:owl="http://www.w3.org/2002/07/owl#" xmlns="http://www.co-ode.org/ontologies/pizza/2005/10/18/pizza.owl#" xmlns:daml="http://www.daml.org/2001/03/daml+oil#" xmlns:dc="http://purl.org/dc/elements/1.1/" xml:base="http://www.co-ode.org/ontologies/pizza/2005/10/18/pizza.owl">
No other declarations of an owl:Ontology resource occur in the pizza ontology so the name of this resource (e.g. http://www.co-ode.org/ontologies/pizza/2005/10/18/pizza.owl) is the name of the pizza ontology. It is relevant to this discussion that the name of the pizza ontology is working URL and clicking on the link pulls up the pizza ontology. In summary:
- the pizza ontology contains exactly one owl:Ontology declaration,
- this declaration is named by rdf:about="",
- the xml:base is declared and
- the xml:base is a working URL that points to the pizza ontology.
This situation is the simplest case and in this case things tend to work very smoothly.
However none of the assumptions above are guaranteed to hold. Sometimes there are no owl:Ontology declarations in the owl ontology. In this case the document base is used as the name of the ontology. The document base is given by xml:base statement or the the physical location that was used to retrieve the ontology if no xml:base declaration is present. If this is still ambiguous, I am not sure what happens.
In addition sometimes the rdf:about statement does not use the empty string. This allows one to give the ontology a name other than the xml:base. Finally the name of the ontology may only be a URI and may not resolve to the right ontology as a URL. Each of these cases adds some difficulty to ontology developers and to some degree motivate the writing of this document.
There is another case that has been troublesome in the past. Occasionally in some cases there are more than one owl:Ontology declaration. In this case tools occasionally get confused and give different results. Protege works by taking the first declaration of an owl:Ontology individual to be the name of the ontology. This would be fine except that the notion of first declaration in RDF/XML is not well defined.
Import By Name
If we look at the owl:Ontology header of the pizza ontology, we see that the declared owl:Ontology resource is related to another resource by the owl:imports property:
<owl:Ontology rdf:about=""> <protege:defaultLanguage rdf:datatype="http://www.w3.org/2001/XMLSchema#string" >en</protege:defaultLanguage> <owl:versionInfo rdf:datatype="http://www.w3.org/2001/XMLSchema#string" >version 1.3</owl:versionInfo> <rdfs:comment xml:lang="en">An example ontology that contains all constructs required for the various versions of the Pizza Tutorial run by Manchester University (see http://www.co-ode.org/resources/tutorials/)</rdfs:comment> <owl:imports rdf:resource="http://protege.stanford.edu/plugins/owl/protege"/> </owl:Ontology>
As triples the owl:imports statement would look like this:
<http://www.co-ode.org/ontologies/pizza/2005/10/18/pizza.owl> owl:imports <http://protege.stanford.edu/plugins/owl/protege>.
This statement tells tools such as Protege or reasoners that the pizza ontology should import an ontology whose name is
http://protege.stanford.edu/plugins/owl/protege.
So now the problem is "how do we find an ontology with this name?". The easy answer would be to simply use the name of the imported ontology as a URL and fetch the imported ontology in this way. Most of the time this works (e.g. this example) but in general the retrieval is not that simple.
Suppose that you are on a train commuting home from work with no internet connection and you decide to edit the pizza ontology. If Protege uses the URL to try to retrieve the ontology it will fail. In this case, this will not be that inconvenient. The user can still edit. But with more complex ontologies with more critical imports (e.g. the birnlex ontology) not finding the imports may make working on the ontology impossible.
In this case, both Protege 3 and Protege 4 (and most other owl ontology tools) have a mechanism by which a user can tell the tool where to search for an ontology with the right name. In Protege 3 the spaces of ontologies to search are called ontology repositories. In Protege 4 they are called ontology libraries. A typical ontology repository or library will consist of all the ontologies in a given directory on the disk. When Protege is trying to resolve an import, it will first search in all the defined ontology repositories for an ontology whose name matches the desired imported name. If it doesn't find any matches, then it tries the ontology name as a URL. If this fails then the import will fail. This is not generally a fatal error, it just means that Protege does not have access to the assertions contained in that import.
The repository mechanism for Protege 3 is described at link. The Protege 4 ontology library documentation will be coming soon.
What Can Go Wrong
There are two main things that can go wrong. First Protege may fail to find the imported ontology and second Protege may follow the above algorithm but end up retrieving an ontology with the wrong name.
Ontology Not Found
The ontology
http://protege.stanford.edu/junitOntologies/testset/NonExistentImport.owl
has an import statement that points to an ontology that does not exist. In this case both Protege 3 and Protege 4 will generate a dialog with the user to find the location of the imported file. The user has the choice of either ignoring the import or pointing Protege to a location where the imported ontology can be found.
In Protege 3 the dialog looks like this:
In Protege 4 the dialog looks like this:
Ontology Found Has The Wrong Name
The ontology at the location
http://protege.stanford.edu/junitOntologies/testset/uglyImport.owl
has an import statement that imports an ontology with the wrong name. More specifically the ontology header is as follows:
<owl:Ontology rdf:about=""> <owl:imports rdf:resource="http://protege.stanford.edu/junitOntologies/testset/travel.owl"/> </owl:Ontology>
However if you follow the link http://protege.stanford.edu/junitOntologies/testset/travel.owl it turns out that the name of the ontology in this location is http://www.owl-ontologies.com/travel.owl.
So in this case, if Protege follows the specification of imports described above then it would not import any statements from the ontology found at the location http://protege.stanford.edu/junitOntologies/testset/travel.owl. The ontology found at that location is not the ontology requested by the import. So the import is not found. Strictly speaking, this is the correct behavior of the Protege tool.
However this behavior is almost never what the user wants. So in both Protege 3 and Protege 4 we adopted a compromise position. We import the assertions from the ontology
http://www.owl-ontologies.com/travel.owl
that we found at the location http://protege.stanford.edu/junitOntologies/testset/travel.owl and we leave a visual cue in the imports tree for the ontology that the import is broken.
In Protege 4 the visual cue is the little green triangle to the right in the diagram below:
A nice feature of Protege 4 is that clicking on the green triangle will bring up a dialog that asks if you want to fix the import:
The advantage of fixing the import is that it will be easier to edit the ontology offline, tools will work better and the ontology will be suitable for use with an application. The disadvantage is that users and tools will either have to know or be told the correct location for the ontology.
Why Not Import By Location?
This is where there are different opinions.
Import by location is a model where the name of an ontology is irrelevant to the import statement. The import statement simply gives a physical address to look for the ontology. Several people have argued that this is the right model. Admittedly the w3c documents do not have satisfactory solutions to the real world problems involving imports. Hopefully the new OWL 1.1 specification will fix the problem and we can put it to rest.
The one case where import by location shines and import by name does very badly is the following:
- The internet is trusted, available and reliable, ontologies are never relocated, people choose to give their ontologies names that do not correspond to their physical location and all the ontologies of interest are on the internet.
With import by location, the import will always be found. With import by name, a person reading an ontology off the web may not be able to determine where to find the imported ontology. While there are several conditions that need to be met for import by location to work, it must be noted that this is not an uncommon situation.
Note that usually an ontology has a name that does not correspond to its physical location only when the ontology has been relocated. When an ontology is relocated, the import by location scheme would require all ontologies making the import be modified. Once the new location is updated in all importing ontologes we would be back in this use case where import by location works.
I will lump the other use cases together but they may have important differences.
- I am commuting home from work with no internet access and unzip a collection of owl files.
- I am developing an application which may not have access to the internet and/or may not be willing to trust the internet even if it had access.
- I have access to the internet but I want to edit some (must be more than one) ontology that I download off the web.
- Web servers, projects and organizations come and go and ontologies are relocated.
In these cases, to varying degrees, import by name works very well and this is why I think it is the right choice. (Cases 1 and 2 are close to my heart.) Consider use case 1 because in some sense it is an extreme. In this case - with import by name - I simply plop the owl files on my disk and my tool can easily determine which ontologies import which. It just needs to parse the import statements and the ontology declarations from the files in question.
Import by location fares much worse in this case. Since IO operations to the internet fail, my tool has no way of figuring out which ontology imports which - it must be told. If the zip file only contains owl files then it is a human who must figure out the imports. Also import trees can be pretty complicated as they have been in several recent examples. This is aggravated when - as in one case - the ontologies in question use different methods of importing the same ontology. This means that my zip file must include a file that records *all* the different ways in which the owl files are downloaded. As different tools will use different versions of this file, I will need to convert the file to all the different formats.