Difference between revisions of "How Owl 2.0 Imports Work"

From Protege Wiki
Jump to: navigation, search
(OWL 2.0 Imports)
(Building XML Catalogs)
 
(10 intermediate revisions by the same user not shown)
Line 9: Line 9:
 
* added support for versions of an ontology
 
* added support for versions of an ontology
 
* using import by location rather than import by name.
 
* using import by location rather than import by name.
The first of these changes is easily explained. In the OWL 2.0 scheme, an ontology can have two IRI's in its name.  The first IRI is the ontology IRI.  The second name is the version IRI for the ontology.  In many cases the version IRI will be null. But when the version IRI is not null, this will mean that the ontology is a specific version of the ontology.
+
 
 +
The motivation for the first of these changes is pretty clear. OWL 2.0 supports versions by allowing an ontology can have two IRI's in its name.  The first IRI is the ontology IRI.  The second name is the version IRI for the ontology.  In many cases the version IRI will be null. But when the version IRI is not null, this will mean that the ontology is a specific version of the ontology.
  
 
Thus for example, I might have an ontology that I am working on which I call
 
Thus for example, I might have an ontology that I am working on which I call
Line 37: Line 38:
 
   http://www.tigraworld.com/protege/determinants.owl.
 
   http://www.tigraworld.com/protege/determinants.owl.
 
</pre>
 
</pre>
should retrieve the latest version of that ontology (which may or may not have a version IRI).
+
should retrieve the latest version of that ontology (which may or may not have a version IRI).  However, it should be noted that the reason OWL 2.0 uses import by name is that it is often the case that ontologies cannot be retrieved by name.
  
 
When importing, these two names allow ontology developers to specify which version of an ontology they want to import.  The can specify a version of an ontology by importing the ontology version IRI.  They can specify the latest version of an ontology by importing the ontology IRI.
 
When importing, these two names allow ontology developers to specify which version of an ontology they want to import.  The can specify a version of an ontology by importing the ontology version IRI.  They can specify the latest version of an ontology by importing the ontology IRI.
  
The second change to OWL 2.0 imports is more difficult to explain.  The reason that OWL 2.0 changed to import by location is that in many cases ontologies cannot be found by name.  Usually when an ontology is first created, it can be found by its name.  But over time the name stops working because either
+
The second change to OWL 2.0 imports is the main subject of this note. OWL 2.0 uses an import by location scheme rather than the [[How Owl Imports Work|import by name]] scheme used in OWL 1.0.  This simply means that an import declaration is a directive to import the ontology that can be found at the physical location represented by the imported IRI.  The reason that OWL 2.0 changed to import by location is that in many cases ontologies cannot be found by name.  This meant that many owl ontologies could not use the import by name scheme to do their imports because then there would be no way for applications or users to find the imported ontologyWith import by location, the importing ontology always states where the imported ontology can be found.
* ontologies are moved to reside on a different server and can no
 
longer be found by using the old server,
 
* purls are not maintained.
 
Even the w3c is unable to keep urls permanent (e.g. the SWRL vocabulary declarations).
 
  
When this happend the import by name scheme becomes awkwardSuppose that I am creating an ontology and I give it the name
+
=== Offline Editing and XML Catalogs ===
 +
 
 +
The disadvantage of the import by location scheme is that it adds a bit of complexity when a user wants to download some ontologies from the internet and either edit them on the hard drive or work with them while offlineTo make this concrete suppose that there are two ontologies on the internet which are located on the web at the location
 +
<pre>
 +
    http://www.tigraworld.com/protege/determinants.owl
 +
</pre>
 +
and
 
<pre>
 
<pre>
  http://www.tigraworld.com/protege/determinants.owl.
+
    http://www.tigraworld.com/protege/continuedFractions.owl.
 +
</pre>
 +
Suppose that the determinants.owl ontology imports the continuedFractions.owl ontology with the following import declaration
 +
<pre>
 +
    import http://www.tigraworld.com/protege/continuedFractions.owl.
 +
</pre>
 +
If the user downloads these ontologies to his disk and invokes an ontology editing tool on the determinants.owl ontology, the the ontology editing tool will naturally import the continuedFractions.owl ontology from its web site at
 +
<pre>
 +
    http://www.tigraworld.com/protege/continuedFractions.owl.
 
</pre>
 
</pre>
Initially I am the developer of this ontolgy and I place the ontology at that site.  Unfortunately, for some reason this site no longer exists and if I wrote a really good ontology other people want to continue using itSo the ontology is then placed at a new location
+
If the user wants to the import of continuedFractions.owl to redirect to the version of the continuedFractions.owl ontology on the users local disk, the user needs to use [http://www.oasis-open.org/committees/entity/spec-2001-08-06.html XML Catalogs]XML Catalogs allow the user to specify that the process of resolving the URL
 
<pre>
 
<pre>
  http://www.stanford.edu/junit/determinants-test01.owl.
+
    import http://www.tigraworld.com/protege/continuedFractions.owl.
 
</pre>
 
</pre>
With the import by name scheme, ontology developers wishing to access this ontology are in a awkward situation.  They can either disobey the OWL 1.0 specification and import by location or they can import by nameIf they import by location, things work well when users are online. When a user is online, tools will follow and import the desired ontology.  The import will be found but the tool will generate a warning that the import is malformed.  If the ontology developer has no control over the determinants ontology, then this is the best he can do in this case.
+
be redirected to a specific location on the local driveFor users who are familiar with Protege 3.4 ontology repositories, the XML catalog will play a very similar role as the .repository files in Protege 3.4.  The big advantage of XML catalogs is that they are a standard mechanism that can be used by any tool that understands OWL.
  
But this method works badly if users download the ontology and then try to work offline. The OWL repository mechansims simply can't link the import statement
+
Thus XML Catalogs will become an essential part of sharing ontologies.  It is therefore important that tools support a variety of mechanisms for generating XML Catalogs.
 +
 
 +
== Building XML Catalogs ==
 +
 
 +
In this section, we will consider the problem of automatically generating XML catalogs.  Assuming that a set of files on disk have been downloaded from the internet to a disk, the question we want to ask is
 +
: ''Where can these files be found on the internet?''
 +
Obviously in general this question cannot be answered.  However, ontologies contain a couple of pointers that are supposed to point to where they can be found, specifically the xml base, the ontology version and the ontology name.  Note that the preferred approach when  users are sharing owl ontologies through e-mail is that they will provide a xml catalog with the owl files that they share.  This would make the automatic generation of the xml catalog unnecessary.
 +
 
 +
=== Generating XML Catalogs During download ===
 +
 
 +
This is the most accurate way of building an xml catalog. The other methods described here are heuristics and as such are optional and can be overridden by a user. I will describe this process with a simple example.
 +
 
 +
Suppose a user want to download the ontology
 
<pre>
 
<pre>
  import http://www.stanford.edu/junit/pizza-test01.owl
+
    http://www.tigraworld.com/protege/determinants.owl
 
</pre>
 
</pre>
to the ontology with the name
+
and its imports to disk.  As part of this download process, Protege will build an xml catalog which reflects where each of the imports was found.  So when the Protege tool processes the import  statement
 
<pre>
 
<pre>
  http://www.tigraworld.com/protege/pizza.owl.
+
    import http://www.tigraworld.com/protege/continuedFractions.owl
 
</pre>
 
</pre>
The OWL 1.0 answer to this is that the import should import by name. But importing by name makes it impossible for tools to find the imported ontology when users are online and don't have access to some searchable repository of ontologies.
+
it will convert
 
 
The OWL 2.0 solution to this problem is to use import by location and XML catalogs.  The recommended import statement would look like this:
 
 
<pre>
 
<pre>
      import http://www.stanford.edu/junit/pizza-test01.owl.
+
    http://www.tigraworld.com/protege/continuedFractions.owl
 
</pre>
 
</pre>
When users are online and want to access the ontologies through their web addresses, they can easily find the desired pizza ontology.  
+
to a url and download what it finds into a file (probably continuedFractions.owl) on the hard disk. When it does this it can add an entry into the xml catalog reflecting that an import of  
 
 
When ontology developers want to work offline, they must find or generate an xml catalog. XML Catalogs are an industry standard language allowing users to specify (among other things) redirection of IRI addresses to alternative addresses.  Thus in this case the ontology developer must create an XML catalog that redirects the IRI
 
 
<pre>
 
<pre>
      http://www.stanford.edu/junit/pizza-test01.owl.
+
    http://www.tigraworld.com/protege/continuedFractions.owl
 
</pre>
 
</pre>
found in the import statement to a different (possibly relative from the XML Catalog location) IRI containing  the desired ontology. Once this XML Catalog has been created it can be shared by users wishing to access ontologies offline.  Thus when a user sends me a collection of ontologies in a zip file, the user can include an XML Catalog that will allow my ontology development tools to interpret the import statements in the appropriate way.
+
should be redirected to the continuedFractions.owl file on the disk.
  
Thus XML Catalogs will become an essential part of sharing ontologiesIt is therefore important that tools support a variety of mechanisms for generating XML Catalogs.
+
To be fully robust, this algorithm will have to be a little bit more complicated than thisFor example, we have seen ontologies where the same import is imported using multiple distinct uri's.  This means that the download algorithm would need to detect duplicates and do the right thing both in terms of how it saves the files and how  it updates the xml catalog.
 +
 
 +
=== Using XML Base ===
  
== Building XML Catalogs ==
+
This algorithm is a heuristic and can be turned off or overridden.  This is the recommended algorithm because it is fast and it generally  returns the information that the user needs.
  
=== At download ===
+
In the specification of xml base, it is stated that the xml base should represent the location where a file can be found.  Thus if the continuedFractions.owl file is found on disk (in rdf/xml format), it is very  likely that its xml base will be
 +
<pre>
 +
    http://www.tigraworld.com/protege/continuedFractions.owl.
 +
</pre>
 +
This would suggest that an import statement of the form
 +
<pre>
 +
  import    http://www.tigraworld.com/protege/continuedFractions.owl
 +
</pre>
 +
can safely be redirected to the continuedFractions.owl file on disk  and the xml catalog  can be updated accordingly.
  
=== Using XML Base ===
 
  
 
=== Using the Ontology IRI or Version IRI ===
 
=== Using the Ontology IRI or Version IRI ===
  
=== Using a quick guess for the Ontology IRI or Version IRI ===
+
If an ontology has a version IRI, then it should be the case that this ontology can be retrieved by turning the version IRI into a URL.  Thus if I  have a file on disk called determinants.owl which has  an ontology IRI of
 +
<pre>
 +
  http://www.tigraworld.com/protege/determinants.owl
 +
</pre>
 +
and a version IRI of
 +
<pre>
 +
  http://www.tigraworld.com/protege/determinants-1.0.owl
 +
</pre>
 +
then this version of the ontology should be found at the location
 +
<pre>
 +
  http://www.tigraworld.com/protege/determinants-1.0.owl.
 +
</pre>
 +
This means that if I have a file on disk called determinants.owl which has an ontology IRI of
 +
<pre>
 +
  http://www.tigraworld.com/protege/determinants.owl
 +
</pre>
 +
and a version IRI of
 +
<pre>
 +
  http://www.tigraworld.com/protege/determinants-1.0.owl
 +
</pre>
 +
then I can guess that the imports directive of
 +
<pre>
 +
  import http://www.tigraworld.com/protege/determinants-1.0.owl
 +
</pre>
 +
can be safely redirected to the determinants.owl file on the disk.
  
=== Comparing offline ontologies with their online version ===
+
Similarly, if an ontology has a name but no version IRI then it should be possible to find the ontology using the name. Both cases described so far should work and are pretty safe heuristics.

Latest revision as of 12:44, November 30, 2009

OWL 2.0 Imports

Under Construction!


Motivation

In OWL 2, imports are handled differently than they are in OWL 1.0. There have been two main changes

  • added support for versions of an ontology
  • using import by location rather than import by name.

The motivation for the first of these changes is pretty clear. OWL 2.0 supports versions by allowing an ontology can have two IRI's in its name. The first IRI is the ontology IRI. The second name is the version IRI for the ontology. In many cases the version IRI will be null. But when the version IRI is not null, this will mean that the ontology is a specific version of the ontology.

Thus for example, I might have an ontology that I am working on which I call

   http://www.tigraworld.com/protege/determinants.owl.

After a while I start needing versions of this ontology, so I create an ontology with an ontology IRI

   http://www.tigraworld.com/protege/determinants.owl

and a version IRI

   http://www.tigraworld.com/protege/determinants-1.0.owl.

A later published verion of this ontology might have the version IRI

   http://www.tigraworld.com/protege/determinants-2.0.owl.

The scheme by which these versions are named is not defined by the OWL 2.0 specification.

The intent is that these IRI's can be used to look up an ontology. If an ontology has a version IRI then following the version IRI using specified protocol should retrieve the ontology with that version. Thus version 1.0 of the determinants ontology can be found at the web location

   http://www.tigraworld.com/protege/determinants-1.0.owl.

Following the ontology IRI, e.g.

   http://www.tigraworld.com/protege/determinants.owl.

should retrieve the latest version of that ontology (which may or may not have a version IRI). However, it should be noted that the reason OWL 2.0 uses import by name is that it is often the case that ontologies cannot be retrieved by name.

When importing, these two names allow ontology developers to specify which version of an ontology they want to import. The can specify a version of an ontology by importing the ontology version IRI. They can specify the latest version of an ontology by importing the ontology IRI.

The second change to OWL 2.0 imports is the main subject of this note. OWL 2.0 uses an import by location scheme rather than the import by name scheme used in OWL 1.0. This simply means that an import declaration is a directive to import the ontology that can be found at the physical location represented by the imported IRI. The reason that OWL 2.0 changed to import by location is that in many cases ontologies cannot be found by name. This meant that many owl ontologies could not use the import by name scheme to do their imports because then there would be no way for applications or users to find the imported ontology. With import by location, the importing ontology always states where the imported ontology can be found.

Offline Editing and XML Catalogs

The disadvantage of the import by location scheme is that it adds a bit of complexity when a user wants to download some ontologies from the internet and either edit them on the hard drive or work with them while offline. To make this concrete suppose that there are two ontologies on the internet which are located on the web at the location

    http://www.tigraworld.com/protege/determinants.owl

and

    http://www.tigraworld.com/protege/continuedFractions.owl.

Suppose that the determinants.owl ontology imports the continuedFractions.owl ontology with the following import declaration

    import http://www.tigraworld.com/protege/continuedFractions.owl.

If the user downloads these ontologies to his disk and invokes an ontology editing tool on the determinants.owl ontology, the the ontology editing tool will naturally import the continuedFractions.owl ontology from its web site at

    http://www.tigraworld.com/protege/continuedFractions.owl.

If the user wants to the import of continuedFractions.owl to redirect to the version of the continuedFractions.owl ontology on the users local disk, the user needs to use XML Catalogs. XML Catalogs allow the user to specify that the process of resolving the URL

    import http://www.tigraworld.com/protege/continuedFractions.owl.

be redirected to a specific location on the local drive. For users who are familiar with Protege 3.4 ontology repositories, the XML catalog will play a very similar role as the .repository files in Protege 3.4. The big advantage of XML catalogs is that they are a standard mechanism that can be used by any tool that understands OWL.

Thus XML Catalogs will become an essential part of sharing ontologies. It is therefore important that tools support a variety of mechanisms for generating XML Catalogs.

Building XML Catalogs

In this section, we will consider the problem of automatically generating XML catalogs. Assuming that a set of files on disk have been downloaded from the internet to a disk, the question we want to ask is

Where can these files be found on the internet?

Obviously in general this question cannot be answered. However, ontologies contain a couple of pointers that are supposed to point to where they can be found, specifically the xml base, the ontology version and the ontology name. Note that the preferred approach when users are sharing owl ontologies through e-mail is that they will provide a xml catalog with the owl files that they share. This would make the automatic generation of the xml catalog unnecessary.

Generating XML Catalogs During download

This is the most accurate way of building an xml catalog. The other methods described here are heuristics and as such are optional and can be overridden by a user. I will describe this process with a simple example.

Suppose a user want to download the ontology

    http://www.tigraworld.com/protege/determinants.owl

and its imports to disk. As part of this download process, Protege will build an xml catalog which reflects where each of the imports was found. So when the Protege tool processes the import statement

    import http://www.tigraworld.com/protege/continuedFractions.owl

it will convert

    http://www.tigraworld.com/protege/continuedFractions.owl

to a url and download what it finds into a file (probably continuedFractions.owl) on the hard disk. When it does this it can add an entry into the xml catalog reflecting that an import of

    http://www.tigraworld.com/protege/continuedFractions.owl

should be redirected to the continuedFractions.owl file on the disk.

To be fully robust, this algorithm will have to be a little bit more complicated than this. For example, we have seen ontologies where the same import is imported using multiple distinct uri's. This means that the download algorithm would need to detect duplicates and do the right thing both in terms of how it saves the files and how it updates the xml catalog.

Using XML Base

This algorithm is a heuristic and can be turned off or overridden. This is the recommended algorithm because it is fast and it generally returns the information that the user needs.

In the specification of xml base, it is stated that the xml base should represent the location where a file can be found. Thus if the continuedFractions.owl file is found on disk (in rdf/xml format), it is very likely that its xml base will be

    http://www.tigraworld.com/protege/continuedFractions.owl.

This would suggest that an import statement of the form

   import     http://www.tigraworld.com/protege/continuedFractions.owl

can safely be redirected to the continuedFractions.owl file on disk and the xml catalog can be updated accordingly.


Using the Ontology IRI or Version IRI

If an ontology has a version IRI, then it should be the case that this ontology can be retrieved by turning the version IRI into a URL. Thus if I have a file on disk called determinants.owl which has an ontology IRI of

   http://www.tigraworld.com/protege/determinants.owl

and a version IRI of

   http://www.tigraworld.com/protege/determinants-1.0.owl

then this version of the ontology should be found at the location

   http://www.tigraworld.com/protege/determinants-1.0.owl.

This means that if I have a file on disk called determinants.owl which has an ontology IRI of

   http://www.tigraworld.com/protege/determinants.owl

and a version IRI of

   http://www.tigraworld.com/protege/determinants-1.0.owl

then I can guess that the imports directive of

   import http://www.tigraworld.com/protege/determinants-1.0.owl

can be safely redirected to the determinants.owl file on the disk.

Similarly, if an ontology has a name but no version IRI then it should be possible to find the ontology using the name. Both cases described so far should work and are pretty safe heuristics.