My configuration: 2 nodes running Tomcat 7.0.26 Using a custom session manager, which extends the DeltaManager My startInternal() method first calls super.startInternal(), then performs a few additional initializations. I reviewed the code of DeltaManager.startInternal(), and it calls getAllClusterSessions() which in turn calls waitForSendAllSessions(), which requires either getStateTransfered() to return true, or a timeout. So by this, I should be able to trust that as the second node starts, the initial sync up of all session data from the first node has completed prior to the startInternal() method exiting (and thus prior to my initializations). This is, however, not the case! I can confirm this by repeatedly logging the value of findSessions().length during my inializations, and see that number going up! There appears to be a race condition between the processing of the message containing the actual session data & the "transfer complete" message. After tracing this through a little further, I see the stateTransfered is set to true in the handleALL_SESSION_TRANSFERCOMPLETE() callback method. And that callback is being called PRIOR to the session data itself even being received! Here is the debug logging output (slightly scrubbed) which shows this out of order messaging: Jul 5, 2012 4:20:41 PM org.apache.catalina.ha.session.DeltaManager getAllClusterSessions INFO: Manager [wwwtest#], requesting session state from org.apache.catalina.tribes.membership.MemberImpl[...]. This operation will timeout if no session state has been received within 60 seconds. Jul 5, 2012 4:20:41 PM org.apache.catalina.ha.session.DeltaManager messageReceived FINE: Manager [wwwtest#]: Received SessionMessage of type=(SESSION-STATE-TRANSFERED) from [org.apache.catalina.tribes.membership.MemberImpl[...] Jul 5, 2012 4:20:41 PM org.apache.catalina.ha.session.DeltaManager handleALL_SESSION_TRANSFERCOMPLETE FINE: Manager [wwwtest#] received from node [[B@6789b939:4,000] session state transfered. Jul 5, 2012 4:20:41 PM org.apache.catalina.ha.session.DeltaManager messageReceived FINE: Manager [wwwtest#]: Received SessionMessage of type=(ALL-SESSION-DATA) from [org.apache.catalina.tribes.membership.MemberImpl[...] Jul 5, 2012 4:20:41 PM org.apache.catalina.ha.session.DeltaManager handleALL_SESSION_DATA FINE: Manager [wwwtest#]: received session state data
In case it's helpful, here's the Cluster configuration...fairly basic stuff: <Cluster className="org.apache.catalina.ha.tcp.SimpleTcpCluster"> <Manager className="my.deltamanager.extension.CustomManager" expireSessionsOnShutdown="false" notifyListenersOnReplication="true"/> <Channel className="org.apache.catalina.tribes.group.GroupChannel"> <Membership className="org.apache.catalina.tribes.membership.McastService" address="239.1.1.1" port="45564" frequency="500" dropTime="3000"/> <Receiver className="org.apache.catalina.tribes.transport.nio.NioReceiver" address="auto" port="4000" autoBind="100" selectorTimeout="5000" maxThreads="6"/> <Sender className="org.apache.catalina.tribes.transport.ReplicationTransmitter"> <Transport className="org.apache.catalina.tribes.transport.nio.PooledParallelSender"/> </Sender> <Interceptor className="org.apache.catalina.tribes.group.interceptors.TcpFailureDetector"/> <Interceptor className="org.apache.catalina.tribes.group.interceptors.MessageDispatch15Interceptor"/> </Channel> <Valve className="org.apache.catalina.ha.tcp.ReplicationValve" filter="\*\.page"/> <Valve className="org.apache.catalina.ha.session.JvmRouteBinderValve"/> <ClusterListener className="org.apache.catalina.ha.session.JvmRouteSessionIDBinderListener"/> <ClusterListener className="org.apache.catalina.ha.session.ClusterSessionListener"/> </Cluster>
Thanks for the report. I think there is a problem with the behavior of the DeltaManager. As you know, DeltaManager is responsible for synchronizing the session on startup. A node receiving the EVT_GET_ALL_SESSIONS message is to serialize all session, and then sends back a EVT_ALL_SESSION_DATA message. After completing EVT_ALL_SESSION_DATA message, sends a EVT_ALL_SESSION_TRANSFERCOMPLETE message. At this time, if channelSendOptions is asynchronous(default), EVT_ALL_SESSION_DATA message is sent asynchronously. As a result, will be a race condition between the processing of the message containing the actual session data and the "transfer complete" message. I'm going to fix this behavior. I intend to make EVT_ALL_SESSION_DATA message always send in synchronous mode. Anyway the current workaround is to set 6 (sync + ack) to channelSendOptions. Best Regards.
Fixed in trunk and 7.0.x and will be included in 7.0.30 onwards. Proposed for 6.0.x. Note: In this fix, EVT_ALL_SESSION_DATA message is sent in synchronous mode. Therefore, it waits for completion of a the message till Sender#timeout (default 3000 milliseconds). When timeout occurs while sending the EVT_ALL_SESSION_DATA message, you can configure following attributes. Sender#timeout DeltaManager#sendAllSessions DeltaManager#sendAllSessionsSize DeltaManager#sendAllSessionsWaitTime
Moving to Tomcat 6 since it has been fixed in 7.
Fixed in 6.0.x and will be included in 6.0.36 onwards.