During replication (after start context) on our production system I get sometimes (with fastasyncqueue) follow error: SCHWERWIEGEND: TCP Worker thread in cluster caught 'java.lang.ArrayIndexOutOfBoundsException: -869396170' closing channel java.lang.ArrayIndexOutOfBoundsException: -869396170 at org.apache.catalina.cluster.io.XByteBuffer.firstIndexOf(XByteBuffer.java:317) at org.apache.catalina.cluster.io.XByteBuffer.countPackages(XByteBuffer.java:170) at org.apache.catalina.cluster.io.ObjectReader.append(ObjectReader.java:87) at org.apache.catalina.cluster.tcp.TcpReplicationThread.drainChannel(TcpReplicationThread.java:127) at org.apache.catalina.cluster.tcp.TcpReplicationThread.run(TcpReplicationThread.java:69) This error stopped the replicationthread and so the replication finished. On the sending instance I can see follow error: WARNUNG: Message lost: [192.168.13.17:4.001] type=[org.apache.catalina.cluster.session.SessionMessageImpl], id=[C48819FFB61BD5EC7A37867EA1626B5F.1-1133790203828] java.net.SocketException: Software caused connection abort: socket write error at java.net.SocketOutputStream.socketWrite0(Native Method) at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92) at java.net.SocketOutputStream.write(SocketOutputStream.java:124) at org.apache.catalina.cluster.tcp.DataSender.writeData(DataSender.java:858) at org.apache.catalina.cluster.tcp.DataSender.pushMessage(DataSender.java:799) at org.apache.catalina.cluster.tcp.FastAsyncSocketSender$FastQueueThread.pushQueuedMessages(FastAsyncSocketSender.java:476) at org.apache.catalina.cluster.tcp.FastAsyncSocketSender$FastQueueThread.run(FastAsyncSocketSender.java:442) My cluster config look like: <Cluster className="org.apache.catalina.cluster.tcp.SimpleTcpCluster" managerClassName="org.apache.catalina.cluster.session.DeltaManager" expireSessionsOnShutdown="false" useDirtyFlag="true" notifyListenersOnReplication="true" doClusterLog="true" clusterLogName="clusterlog"> <Membership className="org.apache.catalina.cluster.mcast.McastService" mcastAddr="228.0.0.4" mcastBindAddress="192.168.13.7" mcastPort="45564" mcastFrequency="500" mcastDropTime="3000"/> <Receiver className="org.apache.catalina.cluster.tcp.ReplicationListener" tcpListenAddress="auto" tcpListenPort="4001" tcpSelectorTimeout="100" tcpThreadCount="6"/> <Sender className="org.apache.catalina.cluster.tcp.ReplicationTransmitter" replicationMode="fastasyncqueue" compress="true" doTransmitterProcessingStats="true" waitForAck="false" autoConnect="false"/> <Valve className="org.apache.catalina.cluster.tcp.ReplicationValve" filter=".*\.gif;.*\.js;.*\.jpg;.*\.png;.*\.htm;.*\.html;.*\.css;.*\.txt;"/> <ClusterListener className="org.apache.catalina.cluster.session.ClusterSessionListener"/> </Cluster> I use 5.5.13( beta ) with jdk1.5.0_6 on Win2000 SP4. regards Dietmar
Very strange! I can't simulate this behaviour at Windows XP and Suse Linux. Please test your config with the SocketReplicationListener <Receiver className="org.apache.catalina.cluster.tcp.SocketReplicationListener" tcpListenAddress="auto" tcpListenPort="4001" /> Peter
Found the bug inside XByteBuffer. The message header lenght has changed and at networks that split message in smaller chunks the reported exception is possible. Very production criticla bug, Arrghh! Thanks Dietmar for reporting and test the fix Peter PS: Fixed inside 5.5.15.