Readme - Clustering Jakarta Tomcat 5.5.9 fix Date: 10.04.2005 Author: Peter Rossbach After some load test the current clustering shows some bugs. => - At some case the complete cluster hang ( Linux Suse 9.1, Windows XP) - Memory consume with fastasyncqueue is to high under heavy load. Probleme with waitAck and sync DataSender#pushMessage - No request processing sometimes. Trouble with SessionManager background thread and wrong autoConnect sync block Here my fixes DataSender * No sync pushMessage with async background thread * KeepAliveMaxRequestCount set to -1 ( disable), only timebased keep alive * Socket Open Counter afer successfull open! * More Trace messages * better wait ack handling FastAsyncSocketSender * move counter queuedNrOfBytes to background thread * snyc only the counter and not the message queueing ( very important perf gap) * More Trace messages AsyncSocketSender * move counter queuedNrOfBytes to background thread * snyc only the counter and not the message queueing ( very important perf gap) * More Trace messages Jdk13ReplicationListener * Add Socket Listener -- Rename at > 5.5.9 to SocketReplicationListener * More Trace messages to better understanding ReplicationTransmitter * set autoConnect to false ( Very bad thing second thread can close a socket that other thread can use!) * sync autoConnect sender check ReplicationListener * More Trace messages to better understanding PooledSocket * sendMessage used autoConnect ( not heavy tested) ===========================================================================S With 5.5.9 are following cluster sender config possible: ### pooled ### <Sender className="org.apache.catalina.cluster.tcp.ReplicationTransmitter" replicationMode="pooled" ackTimeout="@node.ackTimeout@"/> ### fastasyncqueue ### // When you set compress="false" you mus also do that at receiver! // Make test that maxQueueLength is big! <Sender className="org.apache.catalina.cluster.tcp.ReplicationTransmitter" replicationMode="fastasyncqueue" compress="false" doProcessingStats="true" queueTimeWait="true" maxQueueLength="1000" queueDoStats="true" queueCheckLock="true" ackTimeout="15000" waitForAck="true" autoConnect="false" keepAliveTimeout="@node.ackTimeout@" keepAliveMaxRequestCount="-1"/> ### asynchronous ### // When you set compress="false" you mus also do that at receiver! <Sender className="org.apache.catalina.cluster.tcp.ReplicationTransmitter" replicationMode="asynchronous" compress="false" ackTimeout="15000" waitForAck="true" autoConnect="false" keepAliveTimeout="@node.ackTimeout@" keepAliveMaxRequestCount="-1"/> ### synchronous ### <Sender className="org.apache.catalina.cluster.tcp.ReplicationTransmitter" replicationMode="synchronous" compress="false" ackTimeout="15000" waitForAck="true" keepAliveTimeout="@node.ackTimeout@" keepAliveMaxRequestCount="-1"/> With this fix I implement a simple Socket Receiver that not used NIO! Jdk13ReplicationListener At > 5.5.9 I change the name to SocketReplicationListener <Receiver className="org.apache.catalina.cluster.tcp.Jdk13ReplicationListener" tcpListenAddress="@node.clustertcp.address@" tcpListenPort="@node.clustertcp.port@" /> ========================================================== Compile from source, * get Tomcat 5.5.9 binary release * get this fix pack jakarta-tomcat-5.5.9-cluster-fix-src-<date>.tar.gz and extract. * edit build.properties set catalina.home to you 5.5.9 release catalina.home=d:/server/jakarta-tomcat-5.5.9 * than compile and install ant compile install * Install manually Copy build/classes to your tomcat release server/classes ========================================================== Sorry for the release trouble, :-( Peter s. attachment to this bug report
Created attachment 14671 [details] Binary Cluster 5.5.9 fix pack jakarta-tomcat-5.5.9 cluster fix pack /server/classes
Created attachment 14672 [details] Source Custer 5.5.9 Cluster Fix pack Source of the cluster fix pack
Please help me to fix the bug.
Download the binary attachment and extract this in your Tomcat 5.5.9 distribution. Peter
I'm using this patch on 5.5.9 with 8 hosts clustered together. I was seeing alot of memberDisappeared errors, now I'm still seeing them, but with more detail. Here's an example error from catalina.out: Jul 18, 2005 5:40:51 PM org.apache.catalina.cluster.tcp.SimpleTcpCluster memberDisappeared INFO: Received member disappeared:org.apache.catalina.cluster.mcast.McastMember[tcp://10.0.0.15:4002,10.0.0.15,4002, alive=1018550] Jul 18, 2005 5:40:51 PM org.apache.catalina.cluster.tcp.DataSender pushMessage INFO: resending 782 bytes to 10.0.0.15:4002 from 55784 java.net.SocketException: Socket closed at java.net.SocketInputStream.read(SocketInputStream.java:162) at java.net.SocketInputStream.read(SocketInputStream.java:182) at org.apache.catalina.cluster.tcp.DataSender.waitForAck(DataSender.java:542) at org.apache.catalina.cluster.tcp.DataSender.pushMessage(DataSender.java:504) at org.apache.catalina.cluster.tcp.FastAsyncSocketSender$FastQueueThread.run(FastAsyncSocketSender.java:401) A typical cluster config is: <Cluster className="org.apache.catalina.cluster.tcp.SimpleTcpCluster" name="hydraNation" managerClassName="org.apache.catalina.cluster.session.DeltaManager" expireSessionsOnShutdown="false" useDirtyFlag="true" notifyListenersOnReplication="true"> <Membership className="org.apache.catalina.cluster.mcast.McastService" mcastAddr="228.0.0.4" mcastPort="45564" mcastFrequency="700" mcastDropTime="5000"/> <Receiver className="org.apache.catalina.cluster.tcp.Jdk13ReplicationListener" tcpListenAddress="10.0.0.12" compress="false" tcpListenPort="4002" /> <Sender className="org.apache.catalina.cluster.tcp.ReplicationTransmitter" replicationMode="fastasyncqueue" compress="false" doProcessingStats="true" queueTimeWait="true" maxQueueLength="1000" queueDoStats="true" queueCheckLock="true" ackTimeout="15000" waitForAck="true" autoConnect="false" keepAliveTimeout="@node.ackTimeout@" keepAliveMaxRequestCount="-1"/> <Valve className="org.apache.catalina.cluster.tcp.ReplicationValve" filter=".*\.gif;.*\.js;.*\.jpg;.*\.png;.*\.htm;.*\.html;.*\.css;.*\.txt;"/> <Deployer className="org.apache.catalina.cluster.deploy.FarmWarDeployer" tempDir="/tmp/war-temp/" deployDir="/tmp/war-deploy/" watchDir="/tmp/war-listen/" watchEnabled="false"/> </Cluster> any ideas?
Cluster is design that you used cluster domains. Spilt your cluster to pairs of backup nodes and used the new mod_jk 1.2.14 domain attribute to dispatch fail nodes to the right backup. Every Cluster domain use other McastAddress or McastPort. Also your membership attribute are a probleme in real production env. mcastFrequency="700" mcastDropTime="5000"/> Used instead: mcastFrequency="1000" mcastDropTime="30000"/> Peter