Bug 37808 - Worker thread in cluster caught 'java.lang.ArrayIndexOutOfBoundsException: -869396170 closing channel
Summary: Worker thread in cluster caught 'java.lang.ArrayIndexOutOfBoundsException: -8...
Status: RESOLVED FIXED
Alias: None
Product: Tomcat 5
Classification: Unclassified
Component: Catalina:Cluster (show other bugs)
Version: Unknown
Hardware: PC Windows 2000
: P2 normal (vote)
Target Milestone: ---
Assignee: Tomcat Developers Mailing List
URL: http://www.eurotours.at
Keywords:
Depends on:
Blocks:
 
Reported: 2005-12-06 15:58 UTC by dietmar müller
Modified: 2005-12-12 04:30 UTC (History)
0 users



Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description dietmar müller 2005-12-06 15:58:55 UTC
During replication (after start context) on our production system I get
sometimes (with fastasyncqueue) follow error:

SCHWERWIEGEND: TCP Worker thread in cluster caught
'java.lang.ArrayIndexOutOfBoundsException: -869396170' closing channel
java.lang.ArrayIndexOutOfBoundsException: -869396170
at org.apache.catalina.cluster.io.XByteBuffer.firstIndexOf(XByteBuffer.java:317)
at org.apache.catalina.cluster.io.XByteBuffer.countPackages(XByteBuffer.java:170)
at org.apache.catalina.cluster.io.ObjectReader.append(ObjectReader.java:87)
at
org.apache.catalina.cluster.tcp.TcpReplicationThread.drainChannel(TcpReplicationThread.java:127)
at
org.apache.catalina.cluster.tcp.TcpReplicationThread.run(TcpReplicationThread.java:69)

This error stopped the replicationthread and so the replication finished.

On the sending instance I can see follow error:


WARNUNG: Message lost: [192.168.13.17:4.001]
type=[org.apache.catalina.cluster.session.SessionMessageImpl],
id=[C48819FFB61BD5EC7A37867EA1626B5F.1-1133790203828]
java.net.SocketException: Software caused connection abort: socket write error
at java.net.SocketOutputStream.socketWrite0(Native Method)
at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92)
at java.net.SocketOutputStream.write(SocketOutputStream.java:124)
at org.apache.catalina.cluster.tcp.DataSender.writeData(DataSender.java:858)
at org.apache.catalina.cluster.tcp.DataSender.pushMessage(DataSender.java:799)
at
org.apache.catalina.cluster.tcp.FastAsyncSocketSender$FastQueueThread.pushQueuedMessages(FastAsyncSocketSender.java:476)
at
org.apache.catalina.cluster.tcp.FastAsyncSocketSender$FastQueueThread.run(FastAsyncSocketSender.java:442)

My cluster config look like:

<Cluster className="org.apache.catalina.cluster.tcp.SimpleTcpCluster"
                    
managerClassName="org.apache.catalina.cluster.session.DeltaManager"
                     expireSessionsOnShutdown="false"
                     useDirtyFlag="true"
                     notifyListenersOnReplication="true"
	      doClusterLog="true"
                    clusterLogName="clusterlog">

            <Membership 
                className="org.apache.catalina.cluster.mcast.McastService"
                mcastAddr="228.0.0.4"
 	  mcastBindAddress="192.168.13.7"
                mcastPort="45564"
                mcastFrequency="500"
                mcastDropTime="3000"/>

            <Receiver 
                className="org.apache.catalina.cluster.tcp.ReplicationListener"
                tcpListenAddress="auto"
                tcpListenPort="4001"
                tcpSelectorTimeout="100"
                tcpThreadCount="6"/>
		
            <Sender
     		className="org.apache.catalina.cluster.tcp.ReplicationTransmitter"
		replicationMode="fastasyncqueue"
		compress="true"
		doTransmitterProcessingStats="true"
		waitForAck="false"
		autoConnect="false"/>
		
            <Valve className="org.apache.catalina.cluster.tcp.ReplicationValve"
                  
filter=".*\.gif;.*\.js;.*\.jpg;.*\.png;.*\.htm;.*\.html;.*\.css;.*\.txt;"/>
                      
            <ClusterListener
className="org.apache.catalina.cluster.session.ClusterSessionListener"/>
	    
        </Cluster>

I use 5.5.13( beta ) with jdk1.5.0_6 on Win2000 SP4.

regards Dietmar
Comment 1 Peter Rossbach 2005-12-07 09:41:05 UTC
Very strange! I can't simulate this behaviour at Windows XP and Suse Linux.

Please test your config with the SocketReplicationListener

 <Receiver 
                                         
className="org.apache.catalina.cluster.tcp.SocketReplicationListener"
                                    tcpListenAddress="auto"
                                       tcpListenPort="4001" />

Peter

                                  
Comment 2 Peter Rossbach 2005-12-12 13:30:13 UTC
Found the bug inside XByteBuffer. The message header lenght has changed and at
networks that split message in smaller chunks the reported exception is possible. 

Very production criticla bug, Arrghh!

Thanks Dietmar for reporting and test the fix
Peter

PS: Fixed inside 5.5.15.