On my Linux system (Fedore Core 3) running Java 1.4.2_06, when I configure tomcat to use SimpleTcpCluster, then when I try to shutdown tomcat, the shutdown take a long time to complete and finally stops with an error as show in this log excerpt: 2005-11-15 13:48:44,202 INFO Pausing Coyote HTTP/1.1 on http-8888 2005-11-15 13:48:44,202 INFO Pausing Coyote HTTP/1.1 on http-8444 2005-11-15 13:48:45,205 INFO Stopping service Catalina 2005-11-15 13:48:45,206 INFO Manager [/flexnet] expiring sessions upon shutdown 2005-11-15 13:48:45,781 INFO Stopped ClusterSender at cluster Catalina:type=Cluster,host=localhost with name Catalina:type=ClusterSender,host=localhost 2005-11-15 13:50:50,440 INFO Stopping Coyote HTTP/1.1 on http-8888 2005-11-15 13:50:50,440 INFO Stopping Coyote HTTP/1.1 on http-8444 2005-11-15 13:50:50,448 ERROR Unable to process request in ReplicationListener java.nio.channels.ClosedSelectorException at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:55) at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:70) at org.apache.catalina.cluster.tcp.ReplicationListener.listen(ReplicationListener.java:130) at org.apache.catalina.cluster.tcp.ClusterReceiverBase.run(ClusterReceiverBase.java:394) at java.lang.Thread.run(Thread.java:534) 2005-11-15 13:50:50,472 ERROR Unable to start cluster listener. java.lang.NullPointerException at org.apache.catalina.cluster.tcp.ReplicationListener.listen(ReplicationListener.java:182) at org.apache.catalina.cluster.tcp.ClusterReceiverBase.run(ClusterReceiverBase.java:394) at java.lang.Thread.run(Thread.java:534) Notice the long delay between stopping the ClusterSender and stopping Coyote. This is apparently caused by a bug in Java 1.4.2 with closing a Selector when there are selects active on it. There is a simple fix to org.apache.catalina.cluster.tcp.ReplicationListener.stopListening: --- ReplicationListener.java 2005-11-16 09:02:50.055300180 -0800 +++ ReplicationListener.java.fix 2005-11-16 09:02:45.017605588 -0800 @@ -187,8 +187,11 @@ * @see org.apache.catalina.cluster.tcp.ClusterReceiverBase#stopListening() */ protected void stopListening(){ + doListen = false; if ( selector != null ) { try { + for ( int i = 0; i < getTcpThreadCount(); i++ ) + selector.wakeup(); selector.close(); } catch ( Exception x ) { log.error("Unable to close cluster receiver selector.",x); @@ -196,7 +199,6 @@ selector = null; } } - doListen = false; } Basically move the 'doListen = false' to the top of the method to avoid a race condition that causes the Exceptions (selector.listen may be called while the close is in progress and the selector may be set to null while the listener threads are still looping. The loop to call selector.wakeup() once for each thread before calling selector.close() works around the Java bug with closing while selects are in progress.
Looks like a good catch, thanks for reporting it.