Difference between revisions of "CompressionAndRMI"

From Protege Wiki
Jump to: navigation, search
Line 3: Line 3:
 
# A common solution to the above is to modify the GZIPOutputStream so that on a flush() operation it invokes finish().  This is indeed useful for flushing out all the data.  Unfortunately what I found is that when rmi later writes to this same stream again after the flush, an exception is thrown because one is not supposed to finish a GZIPOutputStream and then continue writing.
 
# A common solution to the above is to modify the GZIPOutputStream so that on a flush() operation it invokes finish().  This is indeed useful for flushing out all the data.  Unfortunately what I found is that when rmi later writes to this same stream again after the flush, an exception is thrown because one is not supposed to finish a GZIPOutputStream and then continue writing.
  
So what I needed was a way to tell the compressing stream that I had a unit that could be compressed in its entirety without interfering with future writes.  My solution was to make a compressing output stream out of a [http://java.sun.com/j2se/1.5.0/docs/api/java/util/zip/ZipOutputStream.html ZipOutputStream] out of a
+
So what I needed was a way to tell the compressing stream that I had a unit that could be compressed in its entirety without interfering with future writes.  My solution was to make a compressing output stream based on a  buffered version of a [http://java.sun.com/j2se/1.5.0/docs/api/java/util/zip/ZipOutputStream.html ZipOutputStream].  My idea was to buffer up data to be sent and only send data over the wire  when either the buffer is  full or flush is called explicitly:
 +
<code><pre>
 +
public class CompressingOutputStream extends OutputStream {
 +
    private byte[] data = new byte[BUFFER_SIZE];
 +
    int offset = 0;  // the next location in the buffer to write to
 +
                    // also doubles as the size of the unflushed data
  
There are several other pages that tell you how to set up a compressing socket but I had problems making these implementations work.
+
        ...
 +
    @Override
 +
    public void write(int b) throws IOException {
 +
        ensureNotFull();
 +
        data[offset++] = (byte) b;
 +
        ensureNotFull();
 +
    }
 +
        ...
  
Now some of the other pages describe how to then construct a rmi socket factory that will compress and decompress data going over the wire. The critical step in this work is the construction of a compressing input and output stream that will compress the data over the wire.
+
    private void ensureNotFull() throws IOException {
 +
        if (offset >= BUFFER_SIZE) {
 +
            flush();
 +
        }
 +
    }
 +
</pre></code>
 +
 
 +
When data is flushed for either of these reasons
 +
 
 +
The full implementation can be found [http://smi-protege.stanford.edu/repos/protege/protege-core/trunk/src/edu/stanford/smi/protege/server/socket here]. In particular the [http://smi-protege.stanford.edu/repos/protege/protege-core/trunk/src/edu/stanford/smi/protege/server/socket/CompressingInputStream.java CompressingInputStream.java] and [http://smi-protege.stanford.edu/repos/protege/protege-core/trunk/src/edu/stanford/smi/protege/server/socket/CompressingOutputStream.java CompressingOutputStream.java] are the key to the implementation.

Revision as of 17:48, November 27, 2008

I am writing this page in case there are others who try to implement compression over rmi and find that it is more difficult than expected. There are several other pages that describe how to create and use an rmisocket factory. The tricky part is to develop the compressing input and output streams that can be used by a socket with rmi. The main difficulties are

  1. the client hangs when trying to communicate with the server. This is what happens if you simply try to use a GZIPInputStream and a [GZIPOutputStream. The flush operation on the GZIPOutputStream is not sufficient to have the desired effect on the client. RMI expects that when it flushes the output stream on the server that all the data flushed will appear on the client. Instead in this case the client is still waiting for the rest of the data. I believe that the issue is that the GZIPOutputStream is in the middle of a compress operation and is not ready to flush everything down the wire.
  2. A common solution to the above is to modify the GZIPOutputStream so that on a flush() operation it invokes finish(). This is indeed useful for flushing out all the data. Unfortunately what I found is that when rmi later writes to this same stream again after the flush, an exception is thrown because one is not supposed to finish a GZIPOutputStream and then continue writing.

So what I needed was a way to tell the compressing stream that I had a unit that could be compressed in its entirety without interfering with future writes. My solution was to make a compressing output stream based on a buffered version of a ZipOutputStream. My idea was to buffer up data to be sent and only send data over the wire when either the buffer is full or flush is called explicitly:

public class CompressingOutputStream extends OutputStream {
    private byte[] data = new byte[BUFFER_SIZE];
    int offset = 0;  // the next location in the buffer to write to
                     // also doubles as the size of the unflushed data

         ...
    @Override
    public void write(int b) throws IOException {
        ensureNotFull();
        data[offset++] = (byte) b;
        ensureNotFull();
    }
         ...

    private void ensureNotFull() throws IOException {
        if (offset >= BUFFER_SIZE) {
            flush();
        }
    }

When data is flushed for either of these reasons

The full implementation can be found here. In particular the CompressingInputStream.java and CompressingOutputStream.java are the key to the implementation.