If I understand it correctly, the pooled replication should ensure the availability of the session on all cluster nodes as soon as the request is finished. It doesn't seem to do that with the following test-case: The attached simple webapp (clustertest.war) just prints the content of the session entry "host" and then stores its hostname in that entry. This webapp is installed on 2 tomcat 5.5.11's, tomcat1 and tomcat2, and mod_jk is used as a load balancer, sticky sessions are disabled. (Config files are attached). A JMeter 2.1 Test Plan is used to make requests. It repeatedly makes requests to the webapp, waiting 20ms between the requests. Expected behaviour: The first request yields null, and subsequent ones return tomcat1, then tomcat2, tomcat1, tomcat2 and so on. Actual behaviour: Several requests return null, and the rest is not exactly alternating. In this test case, it stabilized when waiting like 50ms between the requests, but on other webapps with larger sessions, it needed a lot more time. This is quite serious since correctness is not ensured when not using sticky sessions.
Created attachment 16329 [details] the test case and config files
How about not filing a bug ? It should be evident sticky sessions are mandatory, or you're going to run into problems.
*** Bug 36542 has been marked as a duplicate of this bug. ***
I did some further testing, and my test case works as expected with tomcat 5.5.9 without the cluster fix pack (Bug 34389) So you have a regression there, which should be fixed. Anyway, this bug isn't INVALID, it's either a bug, or a WONTFIX, so I reopen it. But you should really fix that, since many load balancers don't support tomcat's sticky sessions.
As Remy tell you, sticky session is mandatory that tomcat clustering works. Clustering is a fallback mechanism. Peter
I don't agree with that at all, sticky sessions is a great benefit, but pooled synchronized cluster should be working. at least it was in the good ol' days :) Peter, do you know what changed in the "cluster fix pack" that made the synch not work properly anymore? Filip
(In reply to comment #6) > I don't agree with that at all, sticky sessions is a great benefit, but pooled > synchronized cluster should be working. > > at least it was in the good ol' days :) No, it never was: the request might be complete from a HTTP standpoint while the servlet is still running (= the client might send the next request, while the replication hasn't been done yet). Besides, the spec requires that all concurrent requests which belong to the same session be processed by the same host. Non sticky sessions is broken for many cases, period. I'll let you close this as INVALID again.
>No, it never was: the request might be complete from a HTTP standpoint while the >servlet is still running (= the client might send the next request, while the >replication hasn't been done yet). yes, this scenario has never been supported. that is correct. But single thread client synchronization has always worked. Christoph, if the scenario that you have is a single client thread per session, then let us know, otherwise we will close this bug.
(In reply to comment #8) > yes, this scenario has never been supported. that is correct. > But single thread client synchronization has always worked. Yes, many webapps would work very well with this mode, while some others would not. For this situation, taking several ms to replicate doesn't seem particularly broken to me, although I suppose the 5.5.10 changelog is quite long.
(In reply to comment #9) > Christoph, if the scenario that you have is a single client thread per session, > then let us know, otherwise we will close this bug. My Test Case above is actually quite simple. A single JMeter thread runs subsequent requests on a cluster of 2 Tomcat servers. A jk load balancer without sticky sessions (for testing only) does the distribution. The jsp page just stores the hostname of the tomcat server in the session. No fancy multi-threading or anything. TC 5.5.9 handles the situation correctly and replicates the session before finishing the http request. 5.5.11 does not. So pooled replication in definitely broken. Please either fix it or document that it's broken. If you should choose not to fix it, pooled, synchronous and asynchronous replication modes are basically useless and should be removed. Thanks for pointing out that the servlet specs (SRV.7.7.2) state that "Within an application marked as distributable, all requests that are part of a session must be handled by one Java Virtual Machine1 ( JVM ) at a time." But in my opinion that means that when the requests are finished, the session state should already be replicated on all other cluster members. A Session with a single 7 char String object seems to need about 20 ms to replicate in my setup. Busy servers with large Sessions may get into some hundreds of ms and this is a problem for me. Bye, Christoph
Its been a while since I ran my test cases, but I will do it again
The differenc between 5.5.9 and 5.5.11 is that waitForAck is on default false for all sender modes. Can you check with this config: <Sender className="org.apache.catalina.cluster.tcp.ReplicationTransmitter" replicationMode="pooled" waitForAck="true" doTransmitterProcessingStats="true" doProcessingStats="true" doWaitAckStats="true" ackTimeout="15000"/> thanks Peter
I would argue the ackTimeout=15000 should be an indicator that wait for ack =true, do you really need two flags to say the same thing? To simplify the implementation, I would use the following logic, and remove the "waitForAck" flag all together. ackTimeout > 0 - wait for ack true, and time out set ackTimeout = 0 - wait for ack false ackTimeout = -1 - wait for ack true, no timeout
(In reply to comment #12) > The differenc between 5.5.9 and 5.5.11 is that waitForAck is on default false > for all sender modes. > Can you check with this config: <snip> The waitForAck was my problem indeed. Thanks you very much for pointing out. But now I'm a bit confused about the difference between pooled and fastasyncqueue cluster replication really is. The docs state "synchronous replication guarantees the session to be replicated before the request returns." But obviousely waitForAck is the controlling setting. Isn't it best to just ignore the waitForAck setting and have it set false for fastasyncqueue and true for pooled? The docs also state that "Asynchronous replication, should be used if you have sticky sessions until fail over", which implies that pooled should be used else. But as this bug report turned out, sticky sessions are mandatory for correct clustering. Maybe you should change the docs accordingly (This should go into another bug report imo, but my bug 36542 was resolved as duplicate) Thanks, Christoph
synchronous: send each session change to other cluster members before returning response to client. asychronous: same as synchronous, but use mutiple sender connections (use any one, that is not currently busy). fastasync: put session change message into local queue and then return response to client. A seperate thread waits for messages coming into the queue and then send the messages to the other cluster members. waitforack: when ending the message, wait for an ACK type answering message from the other cluster members before proceeding (make sending the messages more reliable). If one needs exact synchronization: synchronous are pooled mode with waitforack. Application gets into trouble, when replication gets stuck. If one can live with some latency between changes on the primary node and their replication to the other nodes and on the other hand the cluster should influence application performance and stability only very little: use session stickyness in load balancers combined with fastasync and no waitforack. synchronous/pooled without waitforack: lower latency for replication, although synchronization is not exact. fastasync with waitforack: decoupling replication from request/response but ensuring that replication is checked for success. You are right, we should make the docs more precise. The features are yet very new and as usual documentation takes a while.
Please (anyone involved in this issue) submit the doc enhancements you'd like to see. I'll be glad to quickly look at them and commit them for the next release.
Just marking as an enhancement.
Discussion show it is a user list question not a bug.