Troubleshooting Client Server Connections

From Protege Wiki
Revision as of 11:09, August 4, 2009 by Tredmond (talk | contribs)

Jump to: navigation, search

Client-Server Troubleshooting

This page is in the very early stages. In troubleshooting, it is very useful to understand that the client will first connect to the rmi registry to get a reference to the server and then will use the server reference to contact the server. This protocol is discussed here (ignore the black magic tricks unless they help you get a better understanding).

The first thing to do is to look at any messages on the client and server console. The most fundamental error that can arise is the following exception which appears on the client console:

SEVERE: java.rmi.ConnectException: Connection refused to host: localhost; nested exception is: 
	java.net.ConnectException: Connection refused
	at sun.rmi.transport.tcp.TCPEndpoint.newSocket(TCPEndpoint.java:574)
	at sun.rmi.transport.tcp.TCPChannel.createConnection(TCPChannel.java:185)
	at sun.rmi.transport.tcp.TCPChannel.newConnection(TCPChannel.java:171)
	at sun.rmi.server.UnicastRef.newCall(UnicastRef.java:306)
	at sun.rmi.registry.RegistryImpl_Stub.lookup(Unknown Source)
	at java.rmi.Naming.lookup(Naming.java:84)
	at edu.stanford.smi.protege.server.ServerPanel.connectToHost(ServerPanel.java:140)

Note that the exception happens when trying to call Naming.lookup. Usually the client will pop up a window that says Unable to Connect to Server. This means that the client could not connect to the rmiregistry. Possibly the rmiregistry is not running or there is a firewall or other network problem.

An alternate very different error that appears on the client console is the following exception

SEVERE: java.rmi.ConnectException: Connection refused to host: 67.180.198.51; nested exception is: 
	java.net.ConnectException: Connection timed out
	at sun.rmi.transport.tcp.TCPEndpoint.newSocket(TCPEndpoint.java:574)
	at sun.rmi.transport.tcp.TCPChannel.createConnection(TCPChannel.java:185)
	at sun.rmi.transport.tcp.TCPChannel.newConnection(TCPChannel.java:171)
	at sun.rmi.server.UnicastRef.invoke(UnicastRef.java:94)
	at edu.stanford.smi.protege.server.Server_Stub.openSession(Unknown Source)
	at edu.stanford.smi.protege.server.ServerPanel.createSession(ServerPanel.java:154))

This is usually accompanied by the message Failed to create a session because of either an invalid username/password combination, or a firewall problem. Unfortunately this pop up dialog is very ambiguous and can mean many things. It is for this reason that we have tried to add some additional information to help. In any case the exception tells us two very important things. First the problem occurs during an attempt ot create a session (ServerPanel.createSession) when the client makes a call to the server (Server_Stub.openSession(Unknown Source). In particular, since the client is talking to the server, it means that it has already talked to the rmiregistry to obtain a pointer to the client. That is, this exception means that the client has gotten quite a bit further than it had in the previous case.

Note that the connection is being refused to the host 67.180.198.51. It is important to look at this because this is often the problem - the connection is being made to the wrong place. There is another message that amplifies on this issue:

Server ref = Server_Stub[UnicastRef2 [liveRef: [endpoint:[67.180.198.51:38547,edu.stanford.smi.protege.server.socket.RmiSocketFactory@2](remote),objID:[-4da377e5:122e64da065:-8000, 0]]]]

This is a printout of the server reference that (1) the server passed to the rmiregistry at server startup and (2) the rmiregistry gave to the client to tell it how to contact the server. Not the endpoint:

    67.180.198.51:38547.

This is the host and port that the rmiregistry thinks should be used to connect to the server. The Connection Refused exception suggests that there is some problem with this.

One problem that can occur is that the server misrepresents itself when talking to the rmiregistry. At this point it is worth taking a glance at the first few paragraphs of description of the rmi protocol. The server decides how to represent itself to the based on the line:

    -Djava.rmi.server.hostname=`hostname`

Sometimes this line doesn't do the right thing because the hostname command returns the wrong value. For instance, perhaps the hostname parameter returns smi-tredmond-li instead of smi-tredmond-li.stanford.edu. Perhaps, depending on your network configuration, first of these can be resolved by the client and the second cannot. Another thing that the hostname command sometimes returns is localhost which will not work for any client on a different machine than the server.

The client console/log is not the only place to look for hints about a problem. The server console can also have useful information. For example, if I simply supply the wrong user name or password, the server prints out a message

  WARNING: Failed login for user Timothy Redmond IP: 127.0.1.1 -- Server.openSession()

In addition no server or client-side exceptions are generated. This means that the client successfully contacted the server, the server successfully processed the request but rejected it because the credentials were wrong.

Networks, Firewalls and Telnet

It is out of scope for the Protege team to diagnose network problems. There are simply too many things that can go wrong and too many things to know. The primary indicator that you need to start thinking about netowrk issues is that the server works fine on localhost or on the lan but not from outside the lan. In this case, telnet is a useful tool for quickly evaluating where a problem is located. All real operating systems with possibly (grudgingly admitted) one exception have a telnet command. If your operating system is under configured you can try this link for enabling telnet or [1]. I haven't tried either but you should also feel free to complain to your vendor for leaving out such a basic command.

What telnet does is to allow you to connect to a hostname and a port. If you know a protocol in detail it is also possible (usually not recommended) to talk a bit with a server this way. For example, using the host www.google.com and port 80 you can try GET /. There are also instructions on various web pages for talking with mail servers, etc. In our case, all we are interested in is whether the client can connect. So depending on your telnet client, a successful connection will look something like this:

[tredmond@smi-tredmond-li org.protege.osgi.jdbc.prefs]$ telnet smi-tredmond-li 5100
Trying 127.0.1.1...
Connected to smi-tredmond-li.
Escape character is '^]'.

At this point, with this client, type control-] and then quit and telnet will exit. In the case of a problem, it may take a bit of time for telnet to realize that it cannot connect. In the case that it can connect, it usually connects very quickly.

There are two connections that need to be tested. The first connection is the connection to the rmiregistry. By default the rmi registry runs on port 1099 but it can be changed when the rmi registry is invoked with the command

    rmiregisty [port #].

In addition the server is configured to talk to a particular rmiregistry port with the jvm definition

     -Dprotege.rmi.registry.port=[port #].

The rmi registry port that the server is using can also be seen in the server console message near the top:

    Server port = 5200, registry port = 5100, compressed stream

The other connection that needs to be tested is the server connection. The server port can be determined if you have a server reference message in some clients logs:

Server ref = Server_Stub[UnicastRef2 [liveRef: [endpoint:[67.180.198.51:38547,edu.stanford.smi.protege.server.socket.RmiSocketFactory@2](remote),objID:[-4da377e5:122e64da065:-8000, 0]]]]

but the easiest way to know the server port is to set it with a jvm definition:

    -Dprotege.rmi.server.port=5200.
If telnet cannot connect then neither will Protege and the problem is to figure out why the connection failed.