Difference between revisions of "How Owl 2.0 Imports Work"
(→Motivation) |
(→Building XML Catalogs) |
||
Line 72: | Line 72: | ||
== Building XML Catalogs == | == Building XML Catalogs == | ||
− | In this section, we will consider the problem of automatically generating XML catalogs. | + | In this section, we will consider the problem of automatically generating XML catalogs. Assuming that a set of files on disk have been downloaded from the internet to a disk, the question we want to ask is |
+ | : ''Where can these files be found on the internet?'' | ||
+ | Obviously in general this question cannot be answered. However, ontologies contain a couple of pointers that are supposed to point to where they can be found, specifically the xml base, the ontology version and the ontology name. Note that the preferred approach when users are sharing owl ontologies through e-mail is that they will provide a xml catalog with the owl files that they share. This would make the automatic generation of the xml catalog unnecessary. | ||
− | === | + | === Generating XML Catalogs During download === |
This is the most accurate way of building an xml catalog. The other methods described here are heuristics and as such are optional and can be overridden by a user. I will describe this process with a simple example. | This is the most accurate way of building an xml catalog. The other methods described here are heuristics and as such are optional and can be overridden by a user. I will describe this process with a simple example. | ||
Line 100: | Line 102: | ||
=== Using XML Base === | === Using XML Base === | ||
− | This algorithm is a heuristic and can be turned off or overridden. | + | This algorithm is a heuristic and can be turned off or overridden. This is the recommended algorithm because it is fast and it generally returns the information that the user needs. |
In the specification of xml base, it is stated that the xml base should represent the location where a file can be found. Thus if the continuedFractions.owl file is found on disk (in rdf/xml format), it is very likely that its xml base will be | In the specification of xml base, it is stated that the xml base should represent the location where a file can be found. Thus if the continuedFractions.owl file is found on disk (in rdf/xml format), it is very likely that its xml base will be | ||
Line 114: | Line 116: | ||
=== Using the Ontology IRI or Version IRI === | === Using the Ontology IRI or Version IRI === | ||
− | |||
− | |||
If an ontology has a version IRI, then it should be the case that this ontology can be retrieved by turning the version IRI into a URL. Thus if I have a file on disk called determinants.owl which has an ontology IRI of | If an ontology has a version IRI, then it should be the case that this ontology can be retrieved by turning the version IRI into a URL. Thus if I have a file on disk called determinants.owl which has an ontology IRI of | ||
Line 142: | Line 142: | ||
</pre> | </pre> | ||
can be safely redirected to the determinants.owl file on the disk. | can be safely redirected to the determinants.owl file on the disk. | ||
+ | |||
+ | Similarly, if an ontology has a name but no version IRI then it should be possible to find the ontology using the name. Both cases described so far should work and are pretty safe heuristics. |
Latest revision as of 11:44, November 30, 2009
Contents
OWL 2.0 Imports
Under Construction!
Motivation
In OWL 2, imports are handled differently than they are in OWL 1.0. There have been two main changes
- added support for versions of an ontology
- using import by location rather than import by name.
The motivation for the first of these changes is pretty clear. OWL 2.0 supports versions by allowing an ontology can have two IRI's in its name. The first IRI is the ontology IRI. The second name is the version IRI for the ontology. In many cases the version IRI will be null. But when the version IRI is not null, this will mean that the ontology is a specific version of the ontology.
Thus for example, I might have an ontology that I am working on which I call
http://www.tigraworld.com/protege/determinants.owl.
After a while I start needing versions of this ontology, so I create an ontology with an ontology IRI
http://www.tigraworld.com/protege/determinants.owl
and a version IRI
http://www.tigraworld.com/protege/determinants-1.0.owl.
A later published verion of this ontology might have the version IRI
http://www.tigraworld.com/protege/determinants-2.0.owl.
The scheme by which these versions are named is not defined by the OWL 2.0 specification.
The intent is that these IRI's can be used to look up an ontology. If an ontology has a version IRI then following the version IRI using specified protocol should retrieve the ontology with that version. Thus version 1.0 of the determinants ontology can be found at the web location
http://www.tigraworld.com/protege/determinants-1.0.owl.
Following the ontology IRI, e.g.
http://www.tigraworld.com/protege/determinants.owl.
should retrieve the latest version of that ontology (which may or may not have a version IRI). However, it should be noted that the reason OWL 2.0 uses import by name is that it is often the case that ontologies cannot be retrieved by name.
When importing, these two names allow ontology developers to specify which version of an ontology they want to import. The can specify a version of an ontology by importing the ontology version IRI. They can specify the latest version of an ontology by importing the ontology IRI.
The second change to OWL 2.0 imports is the main subject of this note. OWL 2.0 uses an import by location scheme rather than the import by name scheme used in OWL 1.0. This simply means that an import declaration is a directive to import the ontology that can be found at the physical location represented by the imported IRI. The reason that OWL 2.0 changed to import by location is that in many cases ontologies cannot be found by name. This meant that many owl ontologies could not use the import by name scheme to do their imports because then there would be no way for applications or users to find the imported ontology. With import by location, the importing ontology always states where the imported ontology can be found.
Offline Editing and XML Catalogs
The disadvantage of the import by location scheme is that it adds a bit of complexity when a user wants to download some ontologies from the internet and either edit them on the hard drive or work with them while offline. To make this concrete suppose that there are two ontologies on the internet which are located on the web at the location
http://www.tigraworld.com/protege/determinants.owl
and
http://www.tigraworld.com/protege/continuedFractions.owl.
Suppose that the determinants.owl ontology imports the continuedFractions.owl ontology with the following import declaration
import http://www.tigraworld.com/protege/continuedFractions.owl.
If the user downloads these ontologies to his disk and invokes an ontology editing tool on the determinants.owl ontology, the the ontology editing tool will naturally import the continuedFractions.owl ontology from its web site at
http://www.tigraworld.com/protege/continuedFractions.owl.
If the user wants to the import of continuedFractions.owl to redirect to the version of the continuedFractions.owl ontology on the users local disk, the user needs to use XML Catalogs. XML Catalogs allow the user to specify that the process of resolving the URL
import http://www.tigraworld.com/protege/continuedFractions.owl.
be redirected to a specific location on the local drive. For users who are familiar with Protege 3.4 ontology repositories, the XML catalog will play a very similar role as the .repository files in Protege 3.4. The big advantage of XML catalogs is that they are a standard mechanism that can be used by any tool that understands OWL.
Thus XML Catalogs will become an essential part of sharing ontologies. It is therefore important that tools support a variety of mechanisms for generating XML Catalogs.
Building XML Catalogs
In this section, we will consider the problem of automatically generating XML catalogs. Assuming that a set of files on disk have been downloaded from the internet to a disk, the question we want to ask is
- Where can these files be found on the internet?
Obviously in general this question cannot be answered. However, ontologies contain a couple of pointers that are supposed to point to where they can be found, specifically the xml base, the ontology version and the ontology name. Note that the preferred approach when users are sharing owl ontologies through e-mail is that they will provide a xml catalog with the owl files that they share. This would make the automatic generation of the xml catalog unnecessary.
Generating XML Catalogs During download
This is the most accurate way of building an xml catalog. The other methods described here are heuristics and as such are optional and can be overridden by a user. I will describe this process with a simple example.
Suppose a user want to download the ontology
http://www.tigraworld.com/protege/determinants.owl
and its imports to disk. As part of this download process, Protege will build an xml catalog which reflects where each of the imports was found. So when the Protege tool processes the import statement
import http://www.tigraworld.com/protege/continuedFractions.owl
it will convert
http://www.tigraworld.com/protege/continuedFractions.owl
to a url and download what it finds into a file (probably continuedFractions.owl) on the hard disk. When it does this it can add an entry into the xml catalog reflecting that an import of
http://www.tigraworld.com/protege/continuedFractions.owl
should be redirected to the continuedFractions.owl file on the disk.
To be fully robust, this algorithm will have to be a little bit more complicated than this. For example, we have seen ontologies where the same import is imported using multiple distinct uri's. This means that the download algorithm would need to detect duplicates and do the right thing both in terms of how it saves the files and how it updates the xml catalog.
Using XML Base
This algorithm is a heuristic and can be turned off or overridden. This is the recommended algorithm because it is fast and it generally returns the information that the user needs.
In the specification of xml base, it is stated that the xml base should represent the location where a file can be found. Thus if the continuedFractions.owl file is found on disk (in rdf/xml format), it is very likely that its xml base will be
http://www.tigraworld.com/protege/continuedFractions.owl.
This would suggest that an import statement of the form
import http://www.tigraworld.com/protege/continuedFractions.owl
can safely be redirected to the continuedFractions.owl file on disk and the xml catalog can be updated accordingly.
Using the Ontology IRI or Version IRI
If an ontology has a version IRI, then it should be the case that this ontology can be retrieved by turning the version IRI into a URL. Thus if I have a file on disk called determinants.owl which has an ontology IRI of
http://www.tigraworld.com/protege/determinants.owl
and a version IRI of
http://www.tigraworld.com/protege/determinants-1.0.owl
then this version of the ontology should be found at the location
http://www.tigraworld.com/protege/determinants-1.0.owl.
This means that if I have a file on disk called determinants.owl which has an ontology IRI of
http://www.tigraworld.com/protege/determinants.owl
and a version IRI of
http://www.tigraworld.com/protege/determinants-1.0.owl
then I can guess that the imports directive of
import http://www.tigraworld.com/protege/determinants-1.0.owl
can be safely redirected to the determinants.owl file on the disk.
Similarly, if an ontology has a name but no version IRI then it should be possible to find the ontology using the name. Both cases described so far should work and are pretty safe heuristics.