Difference between revisions of "Protege4ClientServer"

From Protege Wiki
Jump to: navigation, search
(The sandbox user)
(The protege user account)
Line 64: Line 64:
 
Depending on a Protege server administrators security requirements, the administrator may want to run the Protege server from a unprivileged system account.  The Protege server does not require very many privileges; it requires a port that it can listen on and read-write access to a portion of a file-system.  For this reason, in accordance with the [http://en.wikipedia.org/wiki/Principle_of_least_privilege principle of least privilege], it makes sense that the Protege server should run under a user account that is distinct from any user account (so that it does not necessarily have the potential to access any user files) and distinct from any account with special privileges such as the root account.
 
Depending on a Protege server administrators security requirements, the administrator may want to run the Protege server from a unprivileged system account.  The Protege server does not require very many privileges; it requires a port that it can listen on and read-write access to a portion of a file-system.  For this reason, in accordance with the [http://en.wikipedia.org/wiki/Principle_of_least_privilege principle of least privilege], it makes sense that the Protege server should run under a user account that is distinct from any user account (so that it does not necessarily have the potential to access any user files) and distinct from any account with special privileges such as the root account.
  
In linux the user can be added with the command
+
We are attempting to support this mode of operation in the scripts that allow Protege to run as a daemon on the linux, os x and windows platforms.  The current status is that
 +
* on linux we have worked out all the details and have the Protege server running on a couple of machines in this mode. The command to create a system user on a linux system is as follows:
 
<pre>
 
<pre>
 
     adduser --system --home /usr/local/protege protege
 
     adduser --system --home /usr/local/protege protege
 
</pre>
 
</pre>
 +
* on os x we have the ability to specify the user account which will run the Protege server but we have not yet determined how to make the account be a system user account rather than a regular user account.  There is a discussion on [http://bsteinberg.wordpress.com/ the web] describing how use directory services to make this work and I seem to remember this from the days when I was an OS X user.
  
 
== Linux ==
 
== Linux ==

Revision as of 08:41, October 1, 2012

Introduction

These are some pages under development to document the Protege 4 Server. We are hoping to release an early alpha soon.

What is it?

The Protege OWL Server provides a platform for collaborative editing and version control of a collection of ontologies. The Protege server tracks changes made to its ontologies, enforces an access control policy for its documents and checks for conflicts between its clients. When used with the Protege client, ontology editors can view and modify a shared ontology in parallel. If a editor chooses, the editor can watch changes made by other editors as they occur. To change an ontology, an editor first makes the changes in his local copy of the ontology. When he is happy with his changes, he can commit them making them available to other editors of the ontology. Alternatively, an editor making changes to his local copy can save his copy of the changes and commit them in a later session.

In addition, the Protege OWL Server can be used as something more like a simple version control system. We are developing a set of command line tools that will be able to use a Protege OWL Server to provide such traditional version control services as checkn, checkout, update, commit and history query commands. The Protege 4 client can be used in this manner as well: an ontology editor can choose not to turn on auto-update and make all his updates and commits manually.

Comparison with the Protege 3 Server

There are several differences between the Protege 3 Server and the Protege 4 Server:

  • The local copy. In Protege 3, when a client connect to the server, any change made to the client is immediately reflected on the server. In Protege 4, in contrast, changes only get propagated to the Protege server when the user commits the change. This allows a user of a Protege client to consider his changes before sending the changes to the server. This is a significant enough concept that we describe it in more detail below.
  • Decoupled client-server. In Protege 3 when the server goes down or the network is interrupted, the Protege 3 client either freezes or crashes. In contrast, in Protege 4, if the server stops or is inaccessible, the Protege client continues running normally. It is only when some server operation is attempted, such as an update or commit, that the user may become aware that there is a problem communicating with the server.
  • Commit granularity. In Protege 3, changes are sent to the server as they are made. In Protege 4 a collection of changes are only committed when the user is ready the user is able to add a commit comment describing the nature of the changes.
  • Optional automatic update. In Protege 3, a user sees edits from other users as they occur. In Protege 4, this is optional.

The Local Copy/Sandbox

With the Protege 4 client server, when a user checks an ontology out from the server, he gets a separate copy of the server ontology. The user can then modify this copy in any way that he likes and the changes will not go to the server until the user commits the changes.

In fact this local copy can be saved to disk and then even editted with a different editor than Protege before it is committed to the server. Specifically, a user can

  1. start protege and load an ontology from the server,
  2. save the ontology somewhere on the local disk,
  3. exit protege and edit the ontology with a text editor
  4. restart protege and open the ontology from disk
  5. commit the changes which will include the changes made with the text editor.

What happens is that when the file is saved, Protege also saves some files containing the server providing the ontology document, the location of the document on the server and the revision of the ontology document on the server. So if I save an ontology as Thesaurus-redmond.owl in the client.ontologies directory then Protege saves the following files:

  - client.ontologies
       o Thesaurus-redmond.owl
       - .owlserver
            o Thesaurus-redmond.owl.history
            o Thesaurus-redmond.owl.vontology

The Thesaurus-redmond.owl.vontology contains information that describes the relationship between the ontology in Thesaurus-redmond.owl and the document on the server. The Thesaurus-redmond.owl.history contains a local cache of the history of changes made to the ontology document on the server. It is not required - if it is deleted it will be rebuilt - but it provides significant performance advantages for the client especially in the case where either the network is slow or the ontology is large.

Videos

Here is the first of several videos that I am going to make to demonstrate server features:

Performance considerations for large ontologies on a slow network

When a large ontology is being uploaded or downloaded from a server on a slow network, it can take a while to transfer all the necessary data. Unfortunately we have not yet determined how to best monitor and report the progress of this operation so the user doing the upload/download will have little indication of the progress. The good news here is that the user only needs to experience this once for the initial upload of the ontology to the server and once for his initial download of the ontology. In addition, since the upload of an ontology is a one time thing, it is very likely that it can be performed on a faster network.

The issue concerns the change history stored on the server representing the set of changes between revision 0 and revision 1. These changes consist of the full set of changes needed to create the initial version of the ontology. Thus for instance, if the NCI Thesaurus is uploaded onto the server, the set of changes to go from revision 0 (the empty ontology) to revision 1 (the initial version of the Thesaurus on the server) will contain over 1.2 million individual changes. This change set is stored in a 300 MB file which then needs to be transfered to any client that wants a copy of the ontology. (In point of fact, this change set gets compressed before it hits the network so the actual data copied across the wire is only about 44 MB.)

Once the ontology is downloaded to the client, the client can save the ontology with the change history to disk for later reference. When the ontology is reloaded from the disk, the client will already have a copy of the 1.2 million changes from revision 0 to revision 1 and will not need to download it again from the server.

Installation details

Hopefully at the point of the release we will have automated the installation process and these pages will be for advanced users.

The purpose of these pages is to allow people to know what is installed by the server installation and what options the user can change. The installation setup is slightly different depending on whether the operating system in question is Linux, OS X or Windows. The Protege Server should run on other platforms as well though we don't yet support its installation. The key things that need to be figured out for an installation to some other platform is obtaining a version of Java that is at least Java 1.6 and determining how to make the Protege 4 Server start at boot time.


The protege user account

Depending on a Protege server administrators security requirements, the administrator may want to run the Protege server from a unprivileged system account. The Protege server does not require very many privileges; it requires a port that it can listen on and read-write access to a portion of a file-system. For this reason, in accordance with the principle of least privilege, it makes sense that the Protege server should run under a user account that is distinct from any user account (so that it does not necessarily have the potential to access any user files) and distinct from any account with special privileges such as the root account.

We are attempting to support this mode of operation in the scripts that allow Protege to run as a daemon on the linux, os x and windows platforms. The current status is that

  • on linux we have worked out all the details and have the Protege server running on a couple of machines in this mode. The command to create a system user on a linux system is as follows:
    adduser --system --home /usr/local/protege protege
  • on os x we have the ability to specify the user account which will run the Protege server but we have not yet determined how to make the account be a system user account rather than a regular user account. There is a discussion on the web describing how use directory services to make this work and I seem to remember this from the days when I was an OS X user.

Linux

On a linux system the following files and directories are created:

  • /usr/local/protege which contains the core server installation in the subdirectory server, the users data files in the subdirectory data and some command line utilities in the bin subdirectory.
  • /etc/init.d/protege which is a script that ensures that the Protege server is started at boot time.
  • /etc/default/protege which is a properties file that configures the init.d script above. The user will have to modify this file before the server will run correctly.
  • /etc/rc#.d/K20protege which are a symbolic links to /etc/init.d/protege for '#=0,1,6'. These scripts ensure that the Protege Server is correctly shutdown when the computer stopped. In particular, if the sever has unsaved files (a temporary condition in any case) this script give Protege some time to save the files before the system exits. The best way to configure these files is through the update-rc.d script as explained below.
  • /etc/rc#.d/S20protege which are symbolic links to /etc/init.d/protege for '#=2,3,4,5'. These scripts ensure that the Protege Server is running at system startup. The best way to configure these files is through the update-rc.d script as explained below.

When the protege server runs, it will write some system logs to the /var/log/protege directory.

An example/etc/default/protege file looks like this:

#
# This file goes into /etc/default/protege and holds the default
# settings for the Protege Server.
#

HOSTNAME=`hostname`
PROTEGE_SERVER_PREFIX=/usr/local/protege
PROTEGE_SANDBOX_USER=tredmond
JAVA_CMD=/usr/local/java/jdk1.7.0_06/bin/java
PID=/var/log/protege/PID

The HOSTNAME property tells the Protege 4 server how to advertise itself to the world. On a well-configured desktop or server machine that is not hidden by NAT the given setting will probably usually work. If not an IP address works fine. The sandbox user parameter is important and must be changed. This is a user account on the system that is set aside to run the server. Ideally it would be a user account that has minimal access to the system as a whole excepting write access to the /usr/local/protege/data directory. <p/> To configure the /etc/rc#.d scripts first install the protege script into the /etc/init.d directory. Then run the sudo update-rc.d protege defaults command and the results should look something like this:

Neptune:init.d% sudo update-rc.d protege defaults
 Adding system startup for /etc/init.d/protege ...
   /etc/rc0.d/K20protege -> ../init.d/protege
   /etc/rc1.d/K20protege -> ../init.d/protege
   /etc/rc6.d/K20protege -> ../init.d/protege
   /etc/rc2.d/S20protege -> ../init.d/protege
   /etc/rc3.d/S20protege -> ../init.d/protege
   /etc/rc4.d/S20protege -> ../init.d/protege
   /etc/rc5.d/S20protege -> ../init.d/protege
Neptune:init.d% 

The Server can be stopped with the command

               sudo /etc/init.d/protege stop

The Server can be started with the command

               sudo /etc/init.d/protege start

The Server can be restarted with the command

               sudo /etc/init.d/protege restart

The following command will make a first cut estimation of the status of the server:

               sudo /etc/init.d/protege status

OS X

On an OS X system the following files and directories are created:

  • /usr/local/protege which contains the core server installation in the subdirectory server, the users data files in the subdirectory data and some command line utilities in the bin subdirectory.
  • /Library/LaunchDaemons/org.protege.owl.server.plist which is a launchctl file to ensure that the Protege Server starts at boot time. This file must be editted in order for the Protege Server to be automatically started.

When the protege server runs, it will write some system logs to the /var/log/protege directory. <p/> The launchctl file is as follows org.protege.owl.server.plist

The two options that must be changed are circled in the above diagram. As in the Linux case, the username is the user under which the Protege server runs. Ideally this user has minimal access to the system as a whole except for write access to the /usr/local/protege/data and /var/log/protege directories.

The server can be restarted with the command

       sudo launchctl stop org.protege.owl.server

It will restart immediately after stopping.

Windows

To be determined.