Difference between revisions of "How Owl 2.0 Imports Work"

From Protege Wiki
Jump to: navigation, search
(OWL 2.0 Imports)
(Building XML Catalogs)
 
(20 intermediate revisions by the same user not shown)
Line 1: Line 1:
 
= OWL 2.0 Imports =
 
= OWL 2.0 Imports =
  
My understanding is that OWL 2.0 imports are based on an import by location scheme.  In [http://www.w3.org/TR/2008/WD-owl2-syntax-20081202/#Ontology_Documents section 3.2] of the
+
'''Under Construction!'''
[http://www.w3.org/TR/2008/WD-owl2-syntax-20081202/ Structural Specification and Functional-Style Syntax], the notion of using an IRI to ''access'' an ontology document is introduced:
 
<blockquote>
 
* "Each ontology document can be accessed from an IRI by means of an appropriate protocol."
 
* "Each ontology document can be converted in some well-defined way into an ontology (i.e., into an instance of the Ontology class from the structural specification)."
 
</blockquote>
 
This notion of access has some provisions the idea that tools may redirect access to an ontology to a different location:
 
<blockquote>
 
"OWL 2 tools will often need to implement functionality such as caching or off-line processing, where ontology documents may be stored at addresses different from the ones dictated by their ontology IRIs and version IRIs. OWL 2 tools may implement a redirection mechanism: when a tool is used to access an ontology document at IRI I, the tool may redirect I to a different IRI DI and access the ontology document from there instead. The result of accessing the ontology document from DI must be the same as if the ontology were accessed from I."
 
</blockquote>
 
The important part of this quote is the last line where it is indicated that the result of a redirection should be the same as the result that would be obtained from using the IO-scheme indicated by the IRI.  Thus the results of performing an IO operation is the final arbiter of the intended meaning of the import.
 
  
This scheme essentially views imports as IO-directives.  This is a very simple approach to ontology imports when the IO operations behave the same for all users viewing an ontology document.  In these days of a highly reliable and accessible internet this assumption will often hold.  Most import directives point to the internet and these directives are usually easily resolved.  However there are a variety of situations where the IO-directive based approach will not work very well:
 
* a user is offline for a period.  Even in these days there are situations where users do not have reliable access to the internet.
 
* a application cannot trust the IO operations specified in the imports directives.  In particular, many applications must be able to perform even when the internet is not available.
 
* the IO-mechanism indicated by the imports directive is protected by security mechanisms such as a firewall. 
 
* the IO-mechanism indicated by the imports directive is only applicable in a particular runtime environment.  For instance, increasingly users are developing ontologies that are accessible when some local implementation of web container (e.g. tomcat) or agent based environment is running.
 
Each of these situations creates a challenge for users or developers who want to share ontologies.
 
  
In addition, there are some complicating factors that will make it
+
== Motivation ==
 +
 
 +
In OWL 2, imports are handled differently than they are in OWL 1.0. There have been two main changes
 +
* added support for versions of an ontology
 +
* using import by location rather than import by name.
 +
 
 +
The motivation for the first of these changes is pretty clear. OWL 2.0 supports versions by allowing an ontology can have two IRI's in its name.  The first IRI is the ontology IRI.  The second name is the version IRI for the ontology.  In many cases the version IRI will be null. But when the version IRI is not null, this will mean that the ontology is a specific version of the ontology.
 +
 
 +
Thus for example, I might have an ontology that I am working on which I call
 +
<pre>
 +
  http://www.tigraworld.com/protege/determinants.owl.
 +
</pre>
 +
After a while I start needing versions of this ontology, so I create an ontology with an ontology IRI
 +
<pre>
 +
  http://www.tigraworld.com/protege/determinants.owl
 +
</pre>
 +
and a version IRI
 +
<pre>
 +
  http://www.tigraworld.com/protege/determinants-1.0.owl.
 +
</pre>
 +
A later published verion of this ontology might have the version IRI
 +
<pre>
 +
  http://www.tigraworld.com/protege/determinants-2.0.owl.
 +
</pre>
 +
The scheme by which these versions are named is not defined by the OWL 2.0 specification.
 +
 
 +
The intent is that these IRI's can be used to look up an ontology.  If an ontology has a version IRI then following the version IRI using specified protocol should retrieve the ontology with that version. Thus version 1.0 of the determinants ontology can be found at the web location
 +
<pre>
 +
  http://www.tigraworld.com/protege/determinants-1.0.owl.
 +
</pre>
 +
Following the ontology IRI, e.g.
 +
<pre>
 +
  http://www.tigraworld.com/protege/determinants.owl.
 +
</pre>
 +
should retrieve the latest version of that ontology (which may or may not have a version IRI).  However, it should be noted that the reason OWL 2.0 uses import by name is that it is often the case that ontologies cannot be retrieved by name.
 +
 
 +
When importing, these two names allow ontology developers to specify which version of an ontology they want to import.  The can specify a version of an ontology by importing the ontology version IRI.  They can specify the latest version of an ontology by importing the ontology IRI.
 +
 
 +
The second change to OWL 2.0 imports is the main subject of this note. OWL 2.0 uses an import by location scheme rather than the [[How Owl Imports Work|import by name]] scheme used in OWL 1.0.  This simply means that an import declaration is a directive to import the ontology that can be found at the physical location represented by the imported IRI.  The reason that OWL 2.0 changed to import by location is that in many cases ontologies cannot be found by name.  This meant that many owl ontologies could not use the import by name scheme to do their imports because then there would be no way for applications or users to find the imported ontology.  With import by location, the importing ontology always states where the imported ontology can be  found.
 +
 
 +
=== Offline Editing and XML Catalogs ===
 +
 
 +
The disadvantage of the import by location scheme is that it adds a bit of complexity when a user wants to download some ontologies from the internet and either edit them on the hard drive or work with them while offline.  To make this concrete suppose that there are two ontologies on the internet which are located on the web at the location
 +
<pre>
 +
    http://www.tigraworld.com/protege/determinants.owl
 +
</pre>
 +
and
 +
<pre>
 +
    http://www.tigraworld.com/protege/continuedFractions.owl.
 +
</pre>
 +
Suppose that the determinants.owl ontology imports the continuedFractions.owl ontology with the following import declaration
 +
<pre>
 +
    import http://www.tigraworld.com/protege/continuedFractions.owl.
 +
</pre>
 +
If the user downloads these ontologies to his disk and invokes an ontology editing tool on the determinants.owl ontology, the the ontology editing tool will naturally import the continuedFractions.owl ontology from its web site at
 +
<pre>
 +
    http://www.tigraworld.com/protege/continuedFractions.owl.
 +
</pre>
 +
If the user wants to the import of continuedFractions.owl to redirect to the version of the continuedFractions.owl ontology on the users local disk, the user needs to use [http://www.oasis-open.org/committees/entity/spec-2001-08-06.html XML Catalogs].  XML Catalogs allow the user to specify that the process of resolving the URL
 +
<pre>
 +
    import http://www.tigraworld.com/protege/continuedFractions.owl.
 +
</pre>
 +
be redirected to a specific location on the local drive.  For users who are familiar with Protege 3.4 ontology repositories, the XML catalog will play a very similar role as the .repository files in Protege 3.4.  The big advantage of XML catalogs is that they are a standard mechanism that can be used by any tool that understands OWL.
 +
 
 +
Thus XML Catalogs will become an essential part of sharing ontologies.  It is therefore important that tools support a variety of mechanisms for generating XML Catalogs.
 +
 
 +
== Building XML Catalogs ==
 +
 
 +
In this section, we will consider the problem of automatically generating XML catalogs.  Assuming that a set of files on disk have been downloaded from the internet to a disk, the question we want to ask is
 +
: ''Where can these files be found on the internet?''
 +
Obviously in general this question cannot be answered.  However, ontologies contain a couple of pointers that are supposed to point to where they can be found, specifically the xml base, the ontology version and the ontology name.  Note that the preferred approach when  users are sharing owl ontologies through e-mail is that they will provide a xml catalog with the owl files that they share.  This would make the automatic generation of the xml catalog unnecessary.
 +
 
 +
=== Generating XML Catalogs During download ===
 +
 
 +
This is the most accurate way of building an xml catalog.  The other methods described here are heuristics and as such are optional and can be overridden by a user. I will describe this process with a simple example.
 +
 
 +
Suppose a user want to download the ontology
 +
<pre>
 +
    http://www.tigraworld.com/protege/determinants.owl
 +
</pre>
 +
and its imports to disk.  As part of this download process, Protege will build an xml catalog which reflects where each of the imports was found.  So when the Protege tool processes the import  statement
 +
<pre>
 +
    import http://www.tigraworld.com/protege/continuedFractions.owl
 +
</pre>
 +
it will convert
 +
<pre>
 +
    http://www.tigraworld.com/protege/continuedFractions.owl
 +
</pre>
 +
to a url and download what it finds into a file (probably continuedFractions.owl) on the hard disk.  When it does this it can add an entry into the xml catalog reflecting that  an import of
 +
<pre>
 +
    http://www.tigraworld.com/protege/continuedFractions.owl
 +
</pre>
 +
should be redirected to the continuedFractions.owl file on the disk.
 +
 
 +
To be fully robust, this algorithm will have to be a little bit more complicated than this.  For example, we have seen ontologies where the same import is imported using multiple distinct uri's.  This means that the download algorithm would need to detect duplicates and do the right thing both in terms of how it saves the files and how  it updates the xml catalog.
 +
 
 +
=== Using XML Base ===
 +
 
 +
This algorithm is a heuristic and can be turned off or overridden.  This is the recommended algorithm because it is fast and it generally  returns the information that the user needs.
 +
 
 +
In the specification of xml base, it is stated that the xml base should represent the location where a file can be found.  Thus if the continuedFractions.owl file is found on disk (in rdf/xml format), it is very  likely that its xml base will be
 +
<pre>
 +
    http://www.tigraworld.com/protege/continuedFractions.owl.
 +
</pre>
 +
This would suggest that an import statement of the form
 +
<pre>
 +
  import    http://www.tigraworld.com/protege/continuedFractions.owl
 +
</pre>
 +
can safely be redirected to the continuedFractions.owl file on disk  and the xml catalog  can be updated accordingly.
 +
 
 +
 
 +
=== Using the Ontology IRI or Version IRI ===
 +
 
 +
If an ontology has a version IRI, then it should be the case that this ontology can be retrieved by turning the version IRI into a URL.  Thus if I  have a file on disk called determinants.owl which has  an ontology IRI of
 +
<pre>
 +
  http://www.tigraworld.com/protege/determinants.owl
 +
</pre>
 +
and a version IRI of
 +
<pre>
 +
  http://www.tigraworld.com/protege/determinants-1.0.owl
 +
</pre>
 +
then this version of the ontology should be found at the location
 +
<pre>
 +
  http://www.tigraworld.com/protege/determinants-1.0.owl.
 +
</pre>
 +
This means that if I have a file on disk called determinants.owl which has an ontology IRI of
 +
<pre>
 +
  http://www.tigraworld.com/protege/determinants.owl
 +
</pre>
 +
and a version IRI of
 +
<pre>
 +
  http://www.tigraworld.com/protege/determinants-1.0.owl
 +
</pre>
 +
then I can guess that the imports directive of
 +
<pre>
 +
  import http://www.tigraworld.com/protege/determinants-1.0.owl
 +
</pre>
 +
can be safely redirected to the determinants.owl file on the disk.
 +
 
 +
Similarly, if an ontology has a name but no version IRI then it should be possible to find the ontology using the name. Both cases described so far should work and are pretty safe heuristics.

Latest revision as of 12:44, November 30, 2009

OWL 2.0 Imports

Under Construction!


Motivation

In OWL 2, imports are handled differently than they are in OWL 1.0. There have been two main changes

  • added support for versions of an ontology
  • using import by location rather than import by name.

The motivation for the first of these changes is pretty clear. OWL 2.0 supports versions by allowing an ontology can have two IRI's in its name. The first IRI is the ontology IRI. The second name is the version IRI for the ontology. In many cases the version IRI will be null. But when the version IRI is not null, this will mean that the ontology is a specific version of the ontology.

Thus for example, I might have an ontology that I am working on which I call

   http://www.tigraworld.com/protege/determinants.owl.

After a while I start needing versions of this ontology, so I create an ontology with an ontology IRI

   http://www.tigraworld.com/protege/determinants.owl

and a version IRI

   http://www.tigraworld.com/protege/determinants-1.0.owl.

A later published verion of this ontology might have the version IRI

   http://www.tigraworld.com/protege/determinants-2.0.owl.

The scheme by which these versions are named is not defined by the OWL 2.0 specification.

The intent is that these IRI's can be used to look up an ontology. If an ontology has a version IRI then following the version IRI using specified protocol should retrieve the ontology with that version. Thus version 1.0 of the determinants ontology can be found at the web location

   http://www.tigraworld.com/protege/determinants-1.0.owl.

Following the ontology IRI, e.g.

   http://www.tigraworld.com/protege/determinants.owl.

should retrieve the latest version of that ontology (which may or may not have a version IRI). However, it should be noted that the reason OWL 2.0 uses import by name is that it is often the case that ontologies cannot be retrieved by name.

When importing, these two names allow ontology developers to specify which version of an ontology they want to import. The can specify a version of an ontology by importing the ontology version IRI. They can specify the latest version of an ontology by importing the ontology IRI.

The second change to OWL 2.0 imports is the main subject of this note. OWL 2.0 uses an import by location scheme rather than the import by name scheme used in OWL 1.0. This simply means that an import declaration is a directive to import the ontology that can be found at the physical location represented by the imported IRI. The reason that OWL 2.0 changed to import by location is that in many cases ontologies cannot be found by name. This meant that many owl ontologies could not use the import by name scheme to do their imports because then there would be no way for applications or users to find the imported ontology. With import by location, the importing ontology always states where the imported ontology can be found.

Offline Editing and XML Catalogs

The disadvantage of the import by location scheme is that it adds a bit of complexity when a user wants to download some ontologies from the internet and either edit them on the hard drive or work with them while offline. To make this concrete suppose that there are two ontologies on the internet which are located on the web at the location

    http://www.tigraworld.com/protege/determinants.owl

and

    http://www.tigraworld.com/protege/continuedFractions.owl.

Suppose that the determinants.owl ontology imports the continuedFractions.owl ontology with the following import declaration

    import http://www.tigraworld.com/protege/continuedFractions.owl.

If the user downloads these ontologies to his disk and invokes an ontology editing tool on the determinants.owl ontology, the the ontology editing tool will naturally import the continuedFractions.owl ontology from its web site at

    http://www.tigraworld.com/protege/continuedFractions.owl.

If the user wants to the import of continuedFractions.owl to redirect to the version of the continuedFractions.owl ontology on the users local disk, the user needs to use XML Catalogs. XML Catalogs allow the user to specify that the process of resolving the URL

    import http://www.tigraworld.com/protege/continuedFractions.owl.

be redirected to a specific location on the local drive. For users who are familiar with Protege 3.4 ontology repositories, the XML catalog will play a very similar role as the .repository files in Protege 3.4. The big advantage of XML catalogs is that they are a standard mechanism that can be used by any tool that understands OWL.

Thus XML Catalogs will become an essential part of sharing ontologies. It is therefore important that tools support a variety of mechanisms for generating XML Catalogs.

Building XML Catalogs

In this section, we will consider the problem of automatically generating XML catalogs. Assuming that a set of files on disk have been downloaded from the internet to a disk, the question we want to ask is

Where can these files be found on the internet?

Obviously in general this question cannot be answered. However, ontologies contain a couple of pointers that are supposed to point to where they can be found, specifically the xml base, the ontology version and the ontology name. Note that the preferred approach when users are sharing owl ontologies through e-mail is that they will provide a xml catalog with the owl files that they share. This would make the automatic generation of the xml catalog unnecessary.

Generating XML Catalogs During download

This is the most accurate way of building an xml catalog. The other methods described here are heuristics and as such are optional and can be overridden by a user. I will describe this process with a simple example.

Suppose a user want to download the ontology

    http://www.tigraworld.com/protege/determinants.owl

and its imports to disk. As part of this download process, Protege will build an xml catalog which reflects where each of the imports was found. So when the Protege tool processes the import statement

    import http://www.tigraworld.com/protege/continuedFractions.owl

it will convert

    http://www.tigraworld.com/protege/continuedFractions.owl

to a url and download what it finds into a file (probably continuedFractions.owl) on the hard disk. When it does this it can add an entry into the xml catalog reflecting that an import of

    http://www.tigraworld.com/protege/continuedFractions.owl

should be redirected to the continuedFractions.owl file on the disk.

To be fully robust, this algorithm will have to be a little bit more complicated than this. For example, we have seen ontologies where the same import is imported using multiple distinct uri's. This means that the download algorithm would need to detect duplicates and do the right thing both in terms of how it saves the files and how it updates the xml catalog.

Using XML Base

This algorithm is a heuristic and can be turned off or overridden. This is the recommended algorithm because it is fast and it generally returns the information that the user needs.

In the specification of xml base, it is stated that the xml base should represent the location where a file can be found. Thus if the continuedFractions.owl file is found on disk (in rdf/xml format), it is very likely that its xml base will be

    http://www.tigraworld.com/protege/continuedFractions.owl.

This would suggest that an import statement of the form

   import     http://www.tigraworld.com/protege/continuedFractions.owl

can safely be redirected to the continuedFractions.owl file on disk and the xml catalog can be updated accordingly.


Using the Ontology IRI or Version IRI

If an ontology has a version IRI, then it should be the case that this ontology can be retrieved by turning the version IRI into a URL. Thus if I have a file on disk called determinants.owl which has an ontology IRI of

   http://www.tigraworld.com/protege/determinants.owl

and a version IRI of

   http://www.tigraworld.com/protege/determinants-1.0.owl

then this version of the ontology should be found at the location

   http://www.tigraworld.com/protege/determinants-1.0.owl.

This means that if I have a file on disk called determinants.owl which has an ontology IRI of

   http://www.tigraworld.com/protege/determinants.owl

and a version IRI of

   http://www.tigraworld.com/protege/determinants-1.0.owl

then I can guess that the imports directive of

   import http://www.tigraworld.com/protege/determinants-1.0.owl

can be safely redirected to the determinants.owl file on the disk.

Similarly, if an ontology has a name but no version IRI then it should be possible to find the ontology using the name. Both cases described so far should work and are pretty safe heuristics.