Difference between revisions of "How Owl 2.0 Imports Work"

From Protege Wiki
Jump to: navigation, search
(Shareable Imports)
(Building XML Catalogs)
 
(14 intermediate revisions by the same user not shown)
Line 1: Line 1:
 
= OWL 2.0 Imports =
 
= OWL 2.0 Imports =
  
My understanding is that OWL 2.0 imports are based on an import by location scheme.  In [http://www.w3.org/TR/2008/WD-owl2-syntax-20081202/#Ontology_Documents section 3.2] of the
+
'''Under Construction!'''
[http://www.w3.org/TR/2008/WD-owl2-syntax-20081202/ Structural Specification and Functional-Style Syntax], the notion of using an IRI to ''access'' an ontology document is introduced:
 
* "Each ontology document can be accessed from an IRI by means of an appropriate protocol."
 
* "Each ontology document can be converted in some well-defined way into an ontology (i.e., into an instance of the Ontology class from the structural specification)."
 
This notion of access has some provisions the idea that tools may redirect access to an ontology to a different location:
 
<blockquote>
 
"OWL 2 tools will often need to implement functionality such as caching or off-line processing, where ontology documents may be stored at addresses different from the ones dictated by their ontology IRIs and version IRIs. OWL 2 tools may implement a redirection mechanism: when a tool is used to access an ontology document at IRI I, the tool may redirect I to a different IRI DI and access the ontology document from there instead. The result of accessing the ontology document from DI must be the same as if the ontology were accessed from I."
 
</blockquote>
 
The important part of this quote is the last line where it is indicated that the result of a redirection should be the same as the result that would be obtained from using the IO-scheme indicated by the IRI.  Thus the results of performing an IO operation is the final arbiter of the intended meaning of the import.
 
  
This scheme essentially views imports as IO-directives.  This is a very simple approach to ontology imports when the IO operations behave the same for all users.  In these days of a highly reliable and accessible internet this assumption will often hold.  Most import directives point to the internet and these directives are usually easily resolved.  However there are a variety of situations where the IO-directive based approach will not work very well:
 
* a user is offline for a period.  Even in these days there are situations where users do not have reliable access to the internet.
 
* a application cannot trust the IO operations specified in the imports directives.  In particular, many applications must be able to perform even when the internet is not available.
 
* the IO-mechanism indicated by the imports directive is protected by security mechanisms such as a firewall. 
 
* the IO-mechanism indicated by the imports directive is only applicable in a particular runtime environment.  For instance, increasingly users are developing ontologies that are accessible when some local implementation of web container (e.g. tomcat) or agent based environment is running.
 
Each of these situations creates a challenge for users or developers who want to share ontologies.
 
  
In addition it is becoming increasingly common to import an ontology using an IRI that cannot be found in the ontology being imported.  For example, in a recent ontology, an import statement used the IO address "http://purl.org/obo/owl/OBO_REL" to refer to an ontology called "http://purl.org/obo/owl/relationship".  When this is combined with the fact that import trees are becoming increasingly complex, this can create an awkward problem for offline users to predict the intent of ontologies.
+
== Motivation ==
  
What follows is a series of possible approaches that might be taken to mitigate these difficulties. There are no tools yet that include these workarounds and it is not clear what mechanisms will actually be used.
+
In OWL 2, imports are handled differently than they are in OWL 1.0. There have been two main changes
 +
* added support for versions of an ontology
 +
* using import by location rather than import by name.
  
== Shareable Imports ==
+
The motivation for the first of these changes is pretty clear. OWL 2.0 supports versions by allowing an ontology can have two IRI's in its name.  The first IRI is the ontology IRI.  The second name is the version IRI for the ontology.  In many cases the version IRI will be null. But when the version IRI is not null, this will mean that the ontology is a specific version of the ontology.
  
There is a case where the OWL 2.0 specifications suggest how imports should work even when the IO mechanisms are not available.  So in particular, if an ontology document, ''O'', has an ontology IRI, ''v'', and no ontology version IRI then a directive of the form "import ''v''" probably means import the ontology document ''O''.  Similarly, if an ontology document, ''O'', has an ontology version IRI, ''v'', then a directive of the form "import ''v''" probably means import the ontology document ''O''. Finally - modulo issues of getting the wrong version of an ontology - if an ontology document, ''O'', has an ontology IRI, ''v'' then a directive of the form "import ''v''" probably means import the ontology document ''O''. I will call import declarations that follow this discipline shareable imports because they encourage sharing of ontologies between different users and different environments.
+
Thus for example, I might have an ontology that I am working on which I call
 +
<pre>
 +
  http://www.tigraworld.com/protege/determinants.owl.
 +
</pre>
 +
After a while I start needing versions of this ontology, so I create an ontology with an ontology IRI
 +
<pre>
 +
  http://www.tigraworld.com/protege/determinants.owl
 +
</pre>
 +
and a version IRI  
 +
<pre>
 +
  http://www.tigraworld.com/protege/determinants-1.0.owl.
 +
</pre>
 +
A later published verion of this ontology might have the version IRI
 +
<pre>
 +
  http://www.tigraworld.com/protege/determinants-2.0.owl.
 +
</pre>
 +
The scheme by which these versions are named is not defined by the OWL 2.0 specification.
  
The advantage of these heuristics is that when they are applicable they restore all the advantages of the [[How_Owl_Imports_Work|OWL 1.0 import by name scheme]].  The meaning of the import directives can be determined by looking at the content of the ontologies alone. Thus when offline, a user or tool can determine the import tree for a collection of ontologies simply by reading the ontology documents. There is no need for a tool-specific representation of the imports graph that is separate from the ontology contentWhen possible, it would seem that this scheme for importing ontologies is desirable.  Ontology development tools will probably supply mechanisms that will convert import statements into this format.
+
The intent is that these IRI's can be used to look up an ontology.  If an ontology has a version IRI then following the version IRI using specified protocol should retrieve the ontology with that version. Thus version 1.0 of the determinants ontology can be found at the web location
 +
<pre>
 +
  http://www.tigraworld.com/protege/determinants-1.0.owl.
 +
</pre>
 +
Following the ontology IRI, e.g.
 +
<pre>
 +
  http://www.tigraworld.com/protege/determinants.owl.
 +
</pre>
 +
should retrieve the latest version of that ontology (which may or may not have a version IRI)However, it should be noted that the reason OWL 2.0 uses import by name is that it is often the case that ontologies cannot be retrieved by name.
  
These heuristics are based on the following requirements from the OWL 2.0 specifications:
+
When importing, these two names allow ontology developers to specify which version of an ontology they want to import.  The can specify a version of an ontology by importing the ontology version IRI. They can specify the latest version of an ontology by importing the ontology IRI.
* "If O contains an ontology IRI OI but no version IRI, then the ontology document of O should be accessible from the IRI OI."
 
* "If D contains an ontology IRI OI and a version IRI VI, then the ontology document of O should be accessible from the IRI VI; furthermore, if O is the current version of the ontology series with the IRI OI, then the ontology document of O should also be accessible from the IRI OI."
 
We believe that most of the time ontology developers will be able to live by these conditions.  However these conditions are often impossible to meet. One  problem is that when organizations change, the locations of the ontologies cannot be maintained.  Even the w3c group has not been able to meet these requirements (e.g. where is the ontology with the name ''http://www.w3.org/2003/11/swrl''?).  In addition, purl sites - intended to correct these types of problems - are turning out to be primary source of ontologies that cannot be found by their name.
 
  
So the major disadvantage of shareable imports is that often it cannot be applied while staying true to the OWL 2.0 specifications.  In those cases where
+
The second change to OWL 2.0 imports is the main subject of this note. OWL 2.0 uses an import by location scheme rather than the [[How Owl Imports Work|import by name]] scheme used in OWL 1.0.  This simply means that an import declaration is a directive to import the ontology that can be found at the physical location represented by the imported IRI.  The reason that OWL 2.0 changed to import by location is that in many cases ontologies cannot be found by name.  This meant that many owl ontologies could not use the import by name scheme to do their imports because then there would be no way for applications or users to find the imported ontology.  With import by location, the importing ontology always states where the imported ontology can be  found.
ontologies cannot be found by their name or their version name, the shareable imports approach is not applicable.
 
  
== Tool Specific Repository Mechanisms ==
+
=== Offline Editing and XML Catalogs ===
  
In cases where a user or application is offline and the meaning of import declarations cannot be determined, an ontology tool will need to include a table indicating how to redirect the importsTake the case where a tool is offline and is trying to resolve an import  from the location http://purl.org/obo/owl/OBO_REL.  Suppose also that
+
The disadvantage of the import by location scheme is that it adds a bit of complexity when a user wants to download some ontologies from the internet and either edit them on the hard drive or work with them while offlineTo make this concrete suppose that there are two ontologies on the internet which are located on the web at the location
# there is no available ontology with a name or version name of http://purl.org/obo/owl/OBO_REL.
+
<pre>
# there is an available ontology with the name http://purl.org/obo/owl/relationship.
+
    http://www.tigraworld.com/protege/determinants.owl
In this case, the ontology tool must have a mapping from the location http://purl.org/obo/owl/OBO_REL to the file containing the ontology with the name http://purl.org/obo/owl/relationshipThese mappings can be held in a tool specific file.
+
</pre>
 +
and
 +
<pre>
 +
    http://www.tigraworld.com/protege/continuedFractions.owl.
 +
</pre>
 +
Suppose that the determinants.owl ontology imports the continuedFractions.owl ontology with the following import declaration
 +
<pre>
 +
    import http://www.tigraworld.com/protege/continuedFractions.owl.
 +
</pre>
 +
If the user downloads these ontologies to his disk and invokes an ontology editing tool on the determinants.owl ontology, the the ontology editing tool will naturally import the continuedFractions.owl ontology from its web site at
 +
<pre>
 +
    http://www.tigraworld.com/protege/continuedFractions.owl.
 +
</pre>
 +
If the user wants to the import of continuedFractions.owl to redirect to the version of the continuedFractions.owl ontology on the users local disk, the user needs to use [http://www.oasis-open.org/committees/entity/spec-2001-08-06.html XML Catalogs].  XML Catalogs allow the user to specify that the process of resolving the URL
 +
<pre>
 +
    import http://www.tigraworld.com/protege/continuedFractions.owl.
 +
</pre>
 +
be redirected to a specific location on the local drive.  For users who are familiar with Protege 3.4 ontology repositories, the XML catalog will play a very similar role as the .repository files in Protege 3.4The big advantage of XML catalogs is that they are a standard mechanism that can be used by any tool that understands OWL.
  
There are a couple of cases where this technique works extremely wellSuppose that a user is using a tool to access ontologies and is online.  He wants to prepare for offline mode.  The tool can download the needed ontologies to the users disk.  As the download occurs the tool can record which IO locations correspond to which files that have been written on disk.  Then later when the user is offline, the tool can use its map of IO location to file location map to redirect import declarations.
+
Thus XML Catalogs will become an essential part of sharing ontologiesIt is therefore important that tools support a variety of mechanisms for generating XML Catalogs.
  
This technique will probably also be used by developers of ontology tools that hide the ontology from the user.  For example, a tool that uses an ontology  to diagnose the cause of an illness will be used by doctors.  These doctors may have no interest in or knowledge of the underlying ontology.  One of the steps during the process of building this tool will be the construction of the io redirection map.
+
== Building XML Catalogs ==
  
This approach has two disadvantagesFirst, it is tool specificAn IO redirection map is present for the OWL API (or Protege 4) will not be useable by a user of Jena (or TopBraid)Hopefully users will either have tools to convert these maps or will be able to manually reconstruct the map from one tool from the map for another toolSecond there will be scenarios where this information needs to be manually inserted by the user.
+
In this section, we will consider the problem of automatically generating XML catalogsAssuming that a set of files on disk have been downloaded from the internet to a disk, the question we want to ask is  
 +
: ''Where can these files be found on the internet?''
 +
Obviously in general this question cannot be answeredHowever, ontologies contain a couple of pointers that are supposed to point to where they can be found, specifically the xml base, the ontology version and the ontology nameNote that the preferred approach when  users are sharing owl ontologies through e-mail is that they will provide a xml catalog with the owl files that they shareThis would make the automatic generation of the xml catalog unnecessary.
  
== Modification of Import Statements ==
+
=== Generating XML Catalogs During download ===
  
There may be some cases where it makes sense for a tool to change the import directives before exporting or saving an ontlogy.  For example, an ontology repository could change the import statements in its exported ontologies to point back to the repository.  The advantage of this approach is that the modified ontology has the desired import behavior. The disadvantage is the the content of the ontology is changed to obtain this behavior.  In the case of the ontology repository, there could be a separate mechanism for accessing the ontology that would contain the original imports.
+
This is the most accurate way of building an xml catalog.  The other methods described here are heuristics and as such are optional and can be overridden by a user. I will describe this process with a simple example.
  
== Avoiding Duplicate Imports ==
+
Suppose a user want to download the ontology
 +
<pre>
 +
    http://www.tigraworld.com/protege/determinants.owl
 +
</pre>
 +
and its imports to disk.  As part of this download process, Protege will build an xml catalog which reflects where each of the imports was found.  So when the Protege tool processes the import  statement
 +
<pre>
 +
    import http://www.tigraworld.com/protege/continuedFractions.owl
 +
</pre>
 +
it will convert
 +
<pre>
 +
    http://www.tigraworld.com/protege/continuedFractions.owl
 +
</pre>
 +
to a url and download what it finds into a file (probably continuedFractions.owl) on the hard disk.  When it does this it can add an entry into the xml catalog reflecting that  an import of
 +
<pre>
 +
    http://www.tigraworld.com/protege/continuedFractions.owl
 +
</pre>
 +
should be redirected to the continuedFractions.owl file on the disk.
  
In some cases, ontologies in an import closure will import the same ontology using different import statements.
+
To be fully robust, this algorithm will have to be a little bit more complicated than this.  For example, we have seen ontologies where the same import is imported using multiple distinct uri's.  This means that the download algorithm would need to detect duplicates and do the right thing both in terms of how it saves the files and how  it updates the xml catalog.
 +
 
 +
=== Using XML Base ===
 +
 
 +
This algorithm is a heuristic and can be turned off or overridden.  This is the recommended algorithm because it is fast and it generally  returns the information that the user needs.
 +
 
 +
In the specification of xml base, it is stated that the xml base should represent the location where a file can be found.  Thus if the continuedFractions.owl file is found on disk (in rdf/xml format), it is very  likely that its xml base will be
 +
<pre>
 +
    http://www.tigraworld.com/protege/continuedFractions.owl.
 +
</pre>
 +
This would suggest that an import statement of the form
 +
<pre>
 +
  import    http://www.tigraworld.com/protege/continuedFractions.owl
 +
</pre>
 +
can safely be redirected to the continuedFractions.owl file on disk  and the xml catalog  can be updated accordingly.
 +
 
 +
 
 +
=== Using the Ontology IRI or Version IRI ===
 +
 
 +
If an ontology has a version IRI, then it should be the case that this ontology can be retrieved by turning the version IRI into a URL.  Thus if I  have a file on disk called determinants.owl which has  an ontology IRI of
 +
<pre>
 +
  http://www.tigraworld.com/protege/determinants.owl
 +
</pre>
 +
and a version IRI of
 +
<pre>
 +
  http://www.tigraworld.com/protege/determinants-1.0.owl
 +
</pre>
 +
then this version of the ontology should be found at the location
 +
<pre>
 +
  http://www.tigraworld.com/protege/determinants-1.0.owl.
 +
</pre>
 +
This means that if I have a file on disk called determinants.owl which has an ontology IRI of
 +
<pre>
 +
  http://www.tigraworld.com/protege/determinants.owl
 +
</pre>
 +
and a version IRI of
 +
<pre>
 +
  http://www.tigraworld.com/protege/determinants-1.0.owl
 +
</pre>
 +
then I can guess that the imports directive of
 +
<pre>
 +
  import http://www.tigraworld.com/protege/determinants-1.0.owl
 +
</pre>
 +
can be safely redirected to the determinants.owl file on the disk.
 +
 
 +
Similarly, if an ontology has a name but no version IRI then it should be possible to find the ontology using the name. Both cases described so far should work and are pretty safe heuristics.

Latest revision as of 12:44, November 30, 2009

OWL 2.0 Imports

Under Construction!


Motivation

In OWL 2, imports are handled differently than they are in OWL 1.0. There have been two main changes

  • added support for versions of an ontology
  • using import by location rather than import by name.

The motivation for the first of these changes is pretty clear. OWL 2.0 supports versions by allowing an ontology can have two IRI's in its name. The first IRI is the ontology IRI. The second name is the version IRI for the ontology. In many cases the version IRI will be null. But when the version IRI is not null, this will mean that the ontology is a specific version of the ontology.

Thus for example, I might have an ontology that I am working on which I call

   http://www.tigraworld.com/protege/determinants.owl.

After a while I start needing versions of this ontology, so I create an ontology with an ontology IRI

   http://www.tigraworld.com/protege/determinants.owl

and a version IRI

   http://www.tigraworld.com/protege/determinants-1.0.owl.

A later published verion of this ontology might have the version IRI

   http://www.tigraworld.com/protege/determinants-2.0.owl.

The scheme by which these versions are named is not defined by the OWL 2.0 specification.

The intent is that these IRI's can be used to look up an ontology. If an ontology has a version IRI then following the version IRI using specified protocol should retrieve the ontology with that version. Thus version 1.0 of the determinants ontology can be found at the web location

   http://www.tigraworld.com/protege/determinants-1.0.owl.

Following the ontology IRI, e.g.

   http://www.tigraworld.com/protege/determinants.owl.

should retrieve the latest version of that ontology (which may or may not have a version IRI). However, it should be noted that the reason OWL 2.0 uses import by name is that it is often the case that ontologies cannot be retrieved by name.

When importing, these two names allow ontology developers to specify which version of an ontology they want to import. The can specify a version of an ontology by importing the ontology version IRI. They can specify the latest version of an ontology by importing the ontology IRI.

The second change to OWL 2.0 imports is the main subject of this note. OWL 2.0 uses an import by location scheme rather than the import by name scheme used in OWL 1.0. This simply means that an import declaration is a directive to import the ontology that can be found at the physical location represented by the imported IRI. The reason that OWL 2.0 changed to import by location is that in many cases ontologies cannot be found by name. This meant that many owl ontologies could not use the import by name scheme to do their imports because then there would be no way for applications or users to find the imported ontology. With import by location, the importing ontology always states where the imported ontology can be found.

Offline Editing and XML Catalogs

The disadvantage of the import by location scheme is that it adds a bit of complexity when a user wants to download some ontologies from the internet and either edit them on the hard drive or work with them while offline. To make this concrete suppose that there are two ontologies on the internet which are located on the web at the location

    http://www.tigraworld.com/protege/determinants.owl

and

    http://www.tigraworld.com/protege/continuedFractions.owl.

Suppose that the determinants.owl ontology imports the continuedFractions.owl ontology with the following import declaration

    import http://www.tigraworld.com/protege/continuedFractions.owl.

If the user downloads these ontologies to his disk and invokes an ontology editing tool on the determinants.owl ontology, the the ontology editing tool will naturally import the continuedFractions.owl ontology from its web site at

    http://www.tigraworld.com/protege/continuedFractions.owl.

If the user wants to the import of continuedFractions.owl to redirect to the version of the continuedFractions.owl ontology on the users local disk, the user needs to use XML Catalogs. XML Catalogs allow the user to specify that the process of resolving the URL

    import http://www.tigraworld.com/protege/continuedFractions.owl.

be redirected to a specific location on the local drive. For users who are familiar with Protege 3.4 ontology repositories, the XML catalog will play a very similar role as the .repository files in Protege 3.4. The big advantage of XML catalogs is that they are a standard mechanism that can be used by any tool that understands OWL.

Thus XML Catalogs will become an essential part of sharing ontologies. It is therefore important that tools support a variety of mechanisms for generating XML Catalogs.

Building XML Catalogs

In this section, we will consider the problem of automatically generating XML catalogs. Assuming that a set of files on disk have been downloaded from the internet to a disk, the question we want to ask is

Where can these files be found on the internet?

Obviously in general this question cannot be answered. However, ontologies contain a couple of pointers that are supposed to point to where they can be found, specifically the xml base, the ontology version and the ontology name. Note that the preferred approach when users are sharing owl ontologies through e-mail is that they will provide a xml catalog with the owl files that they share. This would make the automatic generation of the xml catalog unnecessary.

Generating XML Catalogs During download

This is the most accurate way of building an xml catalog. The other methods described here are heuristics and as such are optional and can be overridden by a user. I will describe this process with a simple example.

Suppose a user want to download the ontology

    http://www.tigraworld.com/protege/determinants.owl

and its imports to disk. As part of this download process, Protege will build an xml catalog which reflects where each of the imports was found. So when the Protege tool processes the import statement

    import http://www.tigraworld.com/protege/continuedFractions.owl

it will convert

    http://www.tigraworld.com/protege/continuedFractions.owl

to a url and download what it finds into a file (probably continuedFractions.owl) on the hard disk. When it does this it can add an entry into the xml catalog reflecting that an import of

    http://www.tigraworld.com/protege/continuedFractions.owl

should be redirected to the continuedFractions.owl file on the disk.

To be fully robust, this algorithm will have to be a little bit more complicated than this. For example, we have seen ontologies where the same import is imported using multiple distinct uri's. This means that the download algorithm would need to detect duplicates and do the right thing both in terms of how it saves the files and how it updates the xml catalog.

Using XML Base

This algorithm is a heuristic and can be turned off or overridden. This is the recommended algorithm because it is fast and it generally returns the information that the user needs.

In the specification of xml base, it is stated that the xml base should represent the location where a file can be found. Thus if the continuedFractions.owl file is found on disk (in rdf/xml format), it is very likely that its xml base will be

    http://www.tigraworld.com/protege/continuedFractions.owl.

This would suggest that an import statement of the form

   import     http://www.tigraworld.com/protege/continuedFractions.owl

can safely be redirected to the continuedFractions.owl file on disk and the xml catalog can be updated accordingly.


Using the Ontology IRI or Version IRI

If an ontology has a version IRI, then it should be the case that this ontology can be retrieved by turning the version IRI into a URL. Thus if I have a file on disk called determinants.owl which has an ontology IRI of

   http://www.tigraworld.com/protege/determinants.owl

and a version IRI of

   http://www.tigraworld.com/protege/determinants-1.0.owl

then this version of the ontology should be found at the location

   http://www.tigraworld.com/protege/determinants-1.0.owl.

This means that if I have a file on disk called determinants.owl which has an ontology IRI of

   http://www.tigraworld.com/protege/determinants.owl

and a version IRI of

   http://www.tigraworld.com/protege/determinants-1.0.owl

then I can guess that the imports directive of

   import http://www.tigraworld.com/protege/determinants-1.0.owl

can be safely redirected to the determinants.owl file on the disk.

Similarly, if an ontology has a name but no version IRI then it should be possible to find the ontology using the name. Both cases described so far should work and are pretty safe heuristics.