Commons JCS
  1. Commons JCS
  2. JCS-41

RemoteCache & RemoteCacheServerFactory setting RMISocketFactory

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: jcs-1.3, jcs-2.0-beta-1
    • Fix Version/s: jcs-2.0-beta-1
    • Component/s: RMI Remote Cache
    • Labels:
      None
    • Environment:
      All

      Description

      Classes...

      org.apache.jcs.auxiliary.remote.RemoteCache
      and
      org.apache.jcs.auxiliary.remote.server.RemoteCacheServerFactory

      .. both try to set a timeout on RMI connections between the remote cache server and client machines using the following code to install a timeout-enabled socket factory which the RMI subsystem subsequently uses...

      RMISocketFactory.setSocketFactory( new RMISocketFactory() {
      public Socket createSocket( String host, int port ) throws IOException

      { Socket socket = new Socket( host, port ); socket.setSoTimeout( DEFAULT_RMI_SOCKET_FACTORY_TIMEOUT_MS ); socket.setSoLinger( false, 0 ); return socket; }

      public ServerSocket createServerSocket( int port ) throws IOException

      { return new ServerSocket( port ); }
      });

      The socket factory code above applies a "read timeout" to RMI sockets such that if a connection is already established and subsequently stalls or a machine goes offline, the timeout will break the connection as intended. The code does not apply a "connect timeout" however, which means that if an attempt is made to establish a new connection to a machine which is offline, the socket connection attempt will stall for an infinite amount of time (such is the default connect timeout), and therefore the thread opening the connection will stall permanently in the JVM.

      This is not a bug in JCS code, it was a limitation in JDK3 in that you could not AFAIK set a connection timeout on a socket.

      As of JDK4, there is a socket.connect(address, timeout) method, so this issue can be fixed.

      Here's the required replacement code:

      RMISocketFactory.setSocketFactory( new RMISocketFactory() {

      public Socket createSocket( String host, int port ) throws IOException { Socket socket = new Socket(); socket.setSoTimeout(timeoutMillis); socket.setSoLinger( false, 0 ); socket.connect(new InetSocketAddress(host, port), timeoutMillis); return socket; }

      public ServerSocket createServerSocket( int port ) throws IOException { return new ServerSocket( port ); }

      });

      This was an issue for us recently. We fixed it by installing the RMISocketFactory above in the JVM before initializing JCS. We have tested and confirmed that this code works well with JCS, it times out reads same as before and it now times out new connection attempts too.

      How about including this in the next version of JCS?

      By the way I read some JCS mailing list archives from last time socket timeouts were discussed. Not sure if this will be helpful to anyone... but we found that if an RMI client and server were running on the same subnet, timeouts were not required and each machine detected that the other was offline immediately. Cross-subnet through our router however, timeouts became important as attempts to connect to an offline machine resulted in JCS threads hanging whilst trying to connect.

      We are not sure, but we suspect that this is related to our firewall blocking required ICMP "host not reachable" packets between subnets, causing different behaviour depending on the network setup. The replacement code above allows our machines in both subnets to recover when a machine is offline, previously they just stalled.

        Activity

        Hide
        Alistair Forbes added a comment -

        Doesn't the jdk parameter -Dsun.rmi.transport.tcp.readTimeout=xxx achieve the same thing?

        I seem to remember that I had an issue with this timeout...I think it is because JCS sets the timeout for all RMI connections. So if you have RMI connections to other long running services, these get the JCS timeouts.

        Show
        Alistair Forbes added a comment - Doesn't the jdk parameter -Dsun.rmi.transport.tcp.readTimeout=xxx achieve the same thing? I seem to remember that I had an issue with this timeout...I think it is because JCS sets the timeout for all RMI connections. So if you have RMI connections to other long running services, these get the JCS timeouts.
        Hide
        Niall Gallagher added a comment - - edited

        Yes JCS does set the timeout for all RMI connections, it's a JVM-wide thing. It's not a problem for us that JCS sets this for the whole JVM though.

        It looks like -Dsun.rmi.transport.tcp.readTimeout=xxx achieves the same thing as the existing JCS code actually. "The value is passed to java.net.Socket.setSoTimeout" according to http://java.sun.com/j2se/1.3/docs/guide/rmi/sunrmiproperties.html

        ...which is basically what the JCS socket factory does.

        There's no jdk parameter to configure RMI "connection" timeouts as opposed to just "read" timeouts as far as I can see for any of JDK 3, 4, 5 or 6.

        ...Basically this JDK4 method is needed to do that (i.e. programmatically only):
        http://java.sun.com/j2se/1.4.2/docs/api/java/net/Socket.html#connect(java.net.SocketAddress,%20int)

        It looks like the intention of the existing code was to configure socket timeouts in general however? i.e. read AND connection timeouts?

        By the way we've tried -Dsun.rmi.transport.connectionTimeout it doesn't seem to apply to the initial "establishing" of connections - docs seem to indicate it applies to connections which are established but then not used.

        Show
        Niall Gallagher added a comment - - edited Yes JCS does set the timeout for all RMI connections, it's a JVM-wide thing. It's not a problem for us that JCS sets this for the whole JVM though. It looks like -Dsun.rmi.transport.tcp.readTimeout=xxx achieves the same thing as the existing JCS code actually. "The value is passed to java.net.Socket.setSoTimeout" according to http://java.sun.com/j2se/1.3/docs/guide/rmi/sunrmiproperties.html ...which is basically what the JCS socket factory does. There's no jdk parameter to configure RMI "connection" timeouts as opposed to just "read" timeouts as far as I can see for any of JDK 3, 4, 5 or 6. ...Basically this JDK4 method is needed to do that (i.e. programmatically only): http://java.sun.com/j2se/1.4.2/docs/api/java/net/Socket.html#connect(java.net.SocketAddress,%20int ) It looks like the intention of the existing code was to configure socket timeouts in general however? i.e. read AND connection timeouts? By the way we've tried -Dsun.rmi.transport.connectionTimeout it doesn't seem to apply to the initial "establishing" of connections - docs seem to indicate it applies to connections which are established but then not used.
        Hide
        Aaron Smuts added a comment -

        I implemented the fix and made the setting configurable on the server. It will be in the 1.3.2.0-rc temp build.

        Show
        Aaron Smuts added a comment - I implemented the fix and made the setting configurable on the server. It will be in the 1.3.2.0-rc temp build.

          People

          • Assignee:
            Aaron Smuts
            Reporter:
            Niall Gallagher
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Time Tracking

              Estimated:
              Original Estimate - 1h
              1h
              Remaining:
              Remaining Estimate - 1h
              1h
              Logged:
              Time Spent - Not Specified
              Not Specified

                Development