Bug 34389 - Tomcat 5.5.9 Cluster fix pack
Summary: Tomcat 5.5.9 Cluster fix pack
Status: RESOLVED FIXED
Alias: None
Product: Tomcat 5
Classification: Unclassified
Component: Catalina:Cluster (show other bugs)
Version: 5.5.9
Hardware: Other Windows XP
: P2 blocker (vote)
Target Milestone: ---
Assignee: Tomcat Developers Mailing List
URL: http://localhost:8080/jsp-examples/Vi...
Keywords:
Depends on:
Blocks:
 
Reported: 2005-04-10 12:09 UTC by Peter Rossbach
Modified: 2005-07-19 02:45 UTC (History)
1 user (show)



Attachments
Binary Cluster 5.5.9 fix pack (35.98 KB, application/zip)
2005-04-10 12:11 UTC, Peter Rossbach
Details
Source Custer 5.5.9 Cluster Fix pack (28.07 KB, application/zip)
2005-04-10 12:12 UTC, Peter Rossbach
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Peter Rossbach 2005-04-10 12:09:21 UTC
Readme - Clustering Jakarta Tomcat 5.5.9 fix
Date: 10.04.2005
Author: Peter Rossbach

After some load test the current clustering shows some bugs.

=>
- At some case the complete cluster hang ( Linux Suse 9.1, Windows XP)
- Memory consume with fastasyncqueue is to high under heavy load.
       Probleme with waitAck and sync DataSender#pushMessage
- No request processing sometimes.
       Trouble with SessionManager background thread and wrong autoConnect sync
block

Here my fixes

DataSender
*   No sync pushMessage with async background thread
*   KeepAliveMaxRequestCount set to -1 ( disable), only timebased keep alive
*   Socket Open Counter afer successfull open!
*   More Trace messages
*   better wait ack handling

FastAsyncSocketSender
*   move counter queuedNrOfBytes to background thread
*   snyc only the counter and not the message queueing ( very important perf gap)
*   More Trace messages

AsyncSocketSender
*   move counter queuedNrOfBytes to background thread
*   snyc only the counter and not the message queueing ( very important perf gap)
*   More Trace messages

Jdk13ReplicationListener
*   Add Socket Listener -- Rename at > 5.5.9 to SocketReplicationListener
*   More Trace messages to better understanding

ReplicationTransmitter
*   set autoConnect to false ( Very bad thing second thread can close a socket
that other thread can use!)
*   sync autoConnect sender check

ReplicationListener
*   More Trace messages to better understanding

PooledSocket
*   sendMessage used autoConnect ( not heavy tested)

===========================================================================S
With 5.5.9 are following cluster sender config possible:

### pooled ###
             <Sender
                  className="org.apache.catalina.cluster.tcp.ReplicationTransmitter"
                  replicationMode="pooled"
				  ackTimeout="@node.ackTimeout@"/>
 
### fastasyncqueue ###
// When you set compress="false" you mus also do that at receiver!            
// Make test that maxQueueLength is big!
             <Sender
                  className="org.apache.catalina.cluster.tcp.ReplicationTransmitter"
                  replicationMode="fastasyncqueue"
                  compress="false"
                  doProcessingStats="true"
                  queueTimeWait="true"
                  maxQueueLength="1000"
                  queueDoStats="true"
                  queueCheckLock="true"
				  ackTimeout="15000"
                  waitForAck="true"
                  autoConnect="false"
                  keepAliveTimeout="@node.ackTimeout@"
                  keepAliveMaxRequestCount="-1"/>

### asynchronous ###
// When you set compress="false" you mus also do that at receiver!            
             <Sender
                  className="org.apache.catalina.cluster.tcp.ReplicationTransmitter"
                  replicationMode="asynchronous"
                  compress="false"
				  ackTimeout="15000"
                  waitForAck="true"
                  autoConnect="false"
                  keepAliveTimeout="@node.ackTimeout@"
                  keepAliveMaxRequestCount="-1"/>


### synchronous ###
             <Sender
                  className="org.apache.catalina.cluster.tcp.ReplicationTransmitter"
                  replicationMode="synchronous"
                  compress="false"
				  ackTimeout="15000"
                  waitForAck="true"
                  keepAliveTimeout="@node.ackTimeout@"
                  keepAliveMaxRequestCount="-1"/>

With this fix I implement a simple Socket Receiver that not used NIO!
Jdk13ReplicationListener
At > 5.5.9 I change the name to SocketReplicationListener
               <Receiver 
                 
className="org.apache.catalina.cluster.tcp.Jdk13ReplicationListener"
                  tcpListenAddress="@node.clustertcp.address@"
                  tcpListenPort="@node.clustertcp.port@"
                  />
==========================================================
Compile from source,

* get Tomcat 5.5.9 binary release
* get this fix pack jakarta-tomcat-5.5.9-cluster-fix-src-<date>.tar.gz and extract.
* edit build.properties set catalina.home to you 5.5.9 release
catalina.home=d:/server/jakarta-tomcat-5.5.9
* than compile and install
ant compile install

* Install manually
Copy build/classes to your tomcat release server/classes

==========================================================
                  
Sorry for the release trouble, :-(
Peter

s. attachment to this bug report
Comment 1 Peter Rossbach 2005-04-10 12:11:32 UTC
Created attachment 14671 [details]
Binary Cluster 5.5.9 fix pack

jakarta-tomcat-5.5.9 cluster fix pack /server/classes
Comment 2 Peter Rossbach 2005-04-10 12:12:58 UTC
Created attachment 14672 [details]
Source Custer 5.5.9 Cluster Fix pack

Source of the cluster fix pack
Comment 3 Hally Sadia Harun 2005-06-12 05:21:28 UTC
Please help me to fix the bug.
Comment 4 Peter Rossbach 2005-06-12 09:19:16 UTC
Download the binary attachment and extract this in your Tomcat 5.5.9
distribution.
Peter
Comment 5 Joshua Szmajda 2005-07-18 23:42:46 UTC
I'm using this patch on 5.5.9 with 8 hosts clustered together. I was seeing alot
of memberDisappeared errors, now I'm still seeing them, but with more detail.
Here's an example error from catalina.out:

Jul 18, 2005 5:40:51 PM org.apache.catalina.cluster.tcp.SimpleTcpCluster
memberDisappeared
INFO: Received member
disappeared:org.apache.catalina.cluster.mcast.McastMember[tcp://10.0.0.15:4002,10.0.0.15,4002,
alive=1018550]
Jul 18, 2005 5:40:51 PM org.apache.catalina.cluster.tcp.DataSender pushMessage
INFO: resending 782 bytes to 10.0.0.15:4002 from 55784
java.net.SocketException: Socket closed
        at java.net.SocketInputStream.read(SocketInputStream.java:162)
        at java.net.SocketInputStream.read(SocketInputStream.java:182)
        at
org.apache.catalina.cluster.tcp.DataSender.waitForAck(DataSender.java:542)
        at
org.apache.catalina.cluster.tcp.DataSender.pushMessage(DataSender.java:504)
        at
org.apache.catalina.cluster.tcp.FastAsyncSocketSender$FastQueueThread.run(FastAsyncSocketSender.java:401)


A typical cluster config is:
        <Cluster className="org.apache.catalina.cluster.tcp.SimpleTcpCluster"
name="hydraNation"
                 managerClassName="org.apache.catalina.cluster.session.DeltaManager"
                 expireSessionsOnShutdown="false"
                 useDirtyFlag="true"
                 notifyListenersOnReplication="true">

            <Membership
                className="org.apache.catalina.cluster.mcast.McastService"
                mcastAddr="228.0.0.4"
                mcastPort="45564"
                mcastFrequency="700"
                mcastDropTime="5000"/>

            <Receiver
                className="org.apache.catalina.cluster.tcp.Jdk13ReplicationListener"
                tcpListenAddress="10.0.0.12"
                compress="false"
                tcpListenPort="4002"
                />

            <Sender
                  className="org.apache.catalina.cluster.tcp.ReplicationTransmitter"
                  replicationMode="fastasyncqueue"
                  compress="false"
                  doProcessingStats="true"
                  queueTimeWait="true"
                  maxQueueLength="1000"
                  queueDoStats="true"
                  queueCheckLock="true"
                  ackTimeout="15000"
                  waitForAck="true"
                  autoConnect="false"
                  keepAliveTimeout="@node.ackTimeout@"
                  keepAliveMaxRequestCount="-1"/>

            <Valve className="org.apache.catalina.cluster.tcp.ReplicationValve"
                  
filter=".*\.gif;.*\.js;.*\.jpg;.*\.png;.*\.htm;.*\.html;.*\.css;.*\.txt;"/>

            <Deployer className="org.apache.catalina.cluster.deploy.FarmWarDeployer"
                      tempDir="/tmp/war-temp/"
                      deployDir="/tmp/war-deploy/"
                      watchDir="/tmp/war-listen/"
                      watchEnabled="false"/>
        </Cluster>


any ideas?
Comment 6 Peter Rossbach 2005-07-19 10:45:38 UTC
Cluster is design that you used cluster domains.
Spilt your cluster to pairs of backup nodes and used the new
mod_jk 1.2.14 domain attribute to dispatch fail nodes to the
right backup. Every Cluster domain use other McastAddress or McastPort.

Also your membership attribute are a probleme in real production env.
                mcastFrequency="700"
                mcastDropTime="5000"/>

Used instead:
                mcastFrequency="1000"
                mcastDropTime="30000"/>

Peter