Difference between revisions of "Troubleshooting Client Server Connections"

From Protege Wiki
Jump to: navigation, search
(Configuration)
(Configuration)
Line 145: Line 145:
 
  $JAVA_PATH/rmiregistry $RMI_REG_PORT &
 
  $JAVA_PATH/rmiregistry $RMI_REG_PORT &
 
* Make sure you configure your firewall to allow TCP traffic to the RMI_REG_PORT and RMI_SERV_PORT
 
* Make sure you configure your firewall to allow TCP traffic to the RMI_REG_PORT and RMI_SERV_PORT
 +
 +
= See Also =
 +
* [[Protege Client Server Tutorial Advanced]]

Revision as of 11:55, March 5, 2010

Client-Server Troubleshooting

This page is in the very early stages.



Basics

In troubleshooting, it is very useful to understand that the client will first connect to the rmi registry to get a reference to the server and then will use the server reference to contact the server. This protocol is discussed here (ignore the black magic tricks unless they help you get a better understanding).

The first thing to do is to look at any messages on the client and server console. The most fundamental error that can arise is the following exception which appears on the client console:

SEVERE: java.rmi.ConnectException: Connection refused to host: localhost; nested exception is: 
	java.net.ConnectException: Connection refused
	at sun.rmi.transport.tcp.TCPEndpoint.newSocket(TCPEndpoint.java:574)
	at sun.rmi.transport.tcp.TCPChannel.createConnection(TCPChannel.java:185)
	at sun.rmi.transport.tcp.TCPChannel.newConnection(TCPChannel.java:171)
	at sun.rmi.server.UnicastRef.newCall(UnicastRef.java:306)
	at sun.rmi.registry.RegistryImpl_Stub.lookup(Unknown Source)
	at java.rmi.Naming.lookup(Naming.java:84)
	at edu.stanford.smi.protege.server.ServerPanel.connectToHost(ServerPanel.java:140)

Note that the exception happens when trying to call Naming.lookup. Usually the client will pop up a window that says Unable to Connect to Server. This means that the client could not connect to the rmiregistry. Possibly the rmiregistry is not running or there is a firewall or other network problem.

An alternate very different error that appears on the client console is the following exception

SEVERE: java.rmi.ConnectException: Connection refused to host: 67.180.198.51; nested exception is: 
	java.net.ConnectException: Connection timed out
	at sun.rmi.transport.tcp.TCPEndpoint.newSocket(TCPEndpoint.java:574)
	at sun.rmi.transport.tcp.TCPChannel.createConnection(TCPChannel.java:185)
	at sun.rmi.transport.tcp.TCPChannel.newConnection(TCPChannel.java:171)
	at sun.rmi.server.UnicastRef.invoke(UnicastRef.java:94)
	at edu.stanford.smi.protege.server.Server_Stub.openSession(Unknown Source)
	at edu.stanford.smi.protege.server.ServerPanel.createSession(ServerPanel.java:154))

This is usually accompanied by the message Failed to create a session because of either an invalid username/password combination, or a firewall problem. Unfortunately this pop up dialog is very ambiguous and can mean many things. It is for this reason that we have tried to add some additional information to help. In any case the exception tells us two very important things. First the problem occurs during an attempt ot create a session (ServerPanel.createSession) when the client makes a call to the server (Server_Stub.openSession(Unknown Source). In particular, since the client is talking to the server, it means that it has already talked to the rmiregistry to obtain a pointer to the client. That is, this exception means that the client has gotten quite a bit further than it had in the previous case.

Note that the connection is being refused to the host 67.180.198.51. It is important to look at this because this is often the problem - the connection is being made to the wrong place. There is another message that amplifies on this issue:

Server ref = Server_Stub[UnicastRef2 [liveRef: [endpoint:[67.180.198.51:38547,edu.stanford.smi.protege.server.socket.RmiSocketFactory@2](remote),objID:[-4da377e5:122e64da065:-8000, 0]]]]

This is a printout of the server reference that (1) the server passed to the rmiregistry at server startup and (2) the rmiregistry gave to the client to tell it how to contact the server. Not the endpoint:

    67.180.198.51:38547.

This is the host and port that the rmiregistry thinks should be used to connect to the server. The Connection Refused exception suggests that there is some problem with this.

One problem that can occur is that the server misrepresents itself when talking to the rmiregistry. At this point it is worth taking a glance at the first few paragraphs of description of the rmi protocol. The server decides how to represent itself to the based on the line:

    -Djava.rmi.server.hostname=`hostname`

Sometimes this line doesn't do the right thing because the hostname command returns the wrong value. For instance, perhaps the hostname parameter returns smi-tredmond-li instead of smi-tredmond-li.stanford.edu. Perhaps, depending on your network configuration, first of these can be resolved by the client and the second cannot. Another thing that the hostname command sometimes returns is localhost which will not work for any client on a different machine than the server.

The client console/log is not the only place to look for hints about a problem. The server console can also have useful information. For example, if I simply supply the wrong user name or password, the server prints out a message

  WARNING: Failed login for user Timothy Redmond IP: 127.0.1.1 -- Server.openSession()

In addition no server or client-side exceptions are generated. This means that the client successfully contacted the server, the server successfully processed the request but rejected it because the credentials were wrong.


Versioning Problems and UnmarshalExceptions

In order to have a Protege client work reliably with a Protege server, both must be running exactly the same version of Protege. There are several things that can go wrong when the version numbers are different. But the most obvious exception is an UnmarshalException which looks something like the following:

WARNING: Could not connect to remote project Collaborative Pizza -- java.lang.RuntimeException: java.rmi.UnmarshalException: error unmarshalling return; nested exception is: 
	java.io.InvalidClassException: edu.stanford.smi.protege.server.update.ValueUpdate; local class incompatible: stream classdesc serialVersionUID = -7753881900765528485, local class serialVersionUID = -4059275656078639103
	at edu.stanford.smi.protege.server.framestore.RemoteClientFrameStore.convertException(RemoteClientFrameStore.java:388)
	at edu.stanford.smi.protege.server.framestore.RemoteClientFrameStore.getFrame(RemoteClientFrameStore.java:509)
	at edu.stanford.smi.protege.model.framestore.ModificationFrameStore.getFrame(ModificationFrameStore.java:26)
	at edu.stanford.smi.protege.model.framestore.ArgumentCheckingFrameStore.getFrame(ArgumentCheckingFrameStore.java:107)

There are some other things that can go wrong when the client and the server have different versions. In particular another type of error that we saw recently involved a client that used certain remote jobs that the server did not know about. This raised the following exception (seen on the client):

WARNING: edu.stanford.smi.protege.exception.ProtegeIOException: java.rmi.ServerException: RemoteException occurred in server thread; nested exception is: 
	java.rmi.UnmarshalException: error unmarshalling arguments; nested exception is: 
	java.lang.ClassNotFoundException: null class
	at edu.stanford.smi.protege.server.framestore.RemoteClientFrameStore.executeProtegeJob(RemoteClientFrameStore.java:1691)
	at edu.stanford.smi.protege.util.ProtegeJob.execute(ProtegeJob.java:94)
         ...

The key hint here is that the error occurred during executeProtegeJob. The error happened on the server (who is not able to find a class passed to it from the client) and is displayed on the client. This type of exception suggests that the client and server are running different versions of Protege or that the server needs to be running a plugin that is running on the client.


RMI Registry Synchronization

Sometime, not very often, a problem arises where the rmiregistry needs to be restarted. So if the client-server has been running fine and you are suddenly having mysterious problems, try killing the rmiregistry and restarting it. I don't know anyone who understands this issue.


Networks, Firewalls and Telnet

It is out of scope for the Protege team to diagnose network problems. There are simply too many things that can go wrong and too many things to know. The primary indicator that you need to start thinking about netowrk issues is that the server works fine on localhost or on the lan but not from outside the lan. In this case, telnet is a useful tool for quickly evaluating where a problem is located. All real operating systems with possibly (grudgingly admitted) one exception have a telnet command. If your operating system is under configured you can try this link for enabling telnet or [1]. I haven't tried either but you should also feel free to complain to your vendor for leaving out such a basic diagnostic command.

What telnet does is to allow you to connect to a hostname and a port. If you know a protocol in detail it is also possible (usually not recommended) to talk a bit with a server this way. For example, using the host www.google.com and port 80 you can try GET /. There are also instructions on various web pages for talking with mail servers, etc. In our case, all we are interested in is whether the client can connect. So depending on your telnet client, a successful connection will look something like this:

[tredmond@smi-tredmond-li org.protege.osgi.jdbc.prefs]$ telnet smi-tredmond-li 5100
Trying 127.0.1.1...
Connected to smi-tredmond-li.
Escape character is '^]'.

At this point, with this client, type control-] and then quit and telnet will exit. In the case of a problem, it may take a bit of time for telnet to realize that it cannot connect. In the case that it can connect, it usually connects very quickly.

There are two connections that need to be tested. The first connection is the connection to the rmiregistry. By default the rmi registry runs on port 1099 but it can be changed when the rmi registry is invoked with the command

    rmiregisty [port #].

In addition the server is configured to talk to a particular rmiregistry port with the jvm definition

     -Dprotege.rmi.registry.port=[port #].

The rmi registry port that the server is using can also be seen in the server console message near the top:

    Server port = 5200, registry port = 5100, compressed stream

The other connection that needs to be tested is the server connection. The server port can be determined if you have a server reference message in some clients logs:

Server ref = Server_Stub[UnicastRef2 [liveRef: [endpoint:[67.180.198.51:38547,edu.stanford.smi.protege.server.socket.RmiSocketFactory@2](remote),objID:[-4da377e5:122e64da065:-8000, 0]]]]

but the easiest way to know the server port is to set it with a jvm definition:

    -Dprotege.rmi.server.port=5200.

If telnet cannot connect then neither will Protege and the problem is to figure out why the connection failed. This telnet test can be run from different machines on your network to determine exactly what is causing a problem with the network.

Configuration

To resolve configuration issues arising from NAT firewalls, here's a simple walkthrough that hopefully helps someone.

  • In your Protege client application, edit the "protege.properties" file in the installation directory and add the line (or modify if it already exists), where hostname is the HOSTNAME (not the domain name) and RMI_REG_PORT is the RMI Registry Port. By default this is 1099, but can be changed from the server configuration. The global IP Address of the server also works in place of the hostname, but a common mistake is to set this to the local IP address of the server.
edu.stanford.smi.protege.server.ServerPanel.host_name=hostname\:RMI_REG_PORT
  • In your Protege Server installation, make the following changes to "run_protege_server.sh":
    • For the HOSTNAME_PARAM variable, replace `hostname` with the actual hostname of your machine (same as specified from the client side so you know its globally-routable). A common mistake is to set this to "localhost" or to a locally routable hostname.
    • Add the following lines before the PORTOPTS variable is defined:
RMI_REG_PORT=1099
RMI_SERV_PORT=5200
    • Uncomment the line defining the PORTOPTS variable and change it to:
PORTOPTS="-Dprotege.rmi.server.port=$RMI_SERV_PORT -Dprotege.rmi.registry.port=$RMI_REG_PORT"
    • Add the RMI_REG_PORT argument to the rmiregistry command, like this:
$JAVA_PATH/rmiregistry $RMI_REG_PORT &
  • Make sure you configure your firewall to allow TCP traffic to the RMI_REG_PORT and RMI_SERV_PORT

See Also