Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Abandoned
-
5.10.0, 5.11.1
-
None
-
None
-
Solaris 5.11
Description
When a slave takes over for a failed master, pending messages are not delivered.
I have a 5.11 cluster consisting of 2 pairs of master/slaves: m1/s1 and m2/s2. They use multicast://default for their networkConnectors. 1 subscriber, 1 publisher, also both using multicast urls. My subscriber is a durable subscriber. Msgs are persistent.
I am testing system robustness in the face of a master failure. I have 3 test cases, of which 2 behave as expected and 1 is problematic. My publisher connects to a master, sends a set of 10 persistent messages and exits. The subscriber (durable) receives a message and spends 1 sec simulating processing time, and waits for the next msg (auto-acknowledge).
For each test case I connect the subscriber, then publish the message set, then kill a master after a few messages are received by the subscriber. When the slave comes online I expect the remaining msgs to be delivered.
1. subscribe to m2, publish to m2, kill m2. Messages are all delivered
2. subscribe to m1, publish to m2, kill m2. Messages are all delivered
3. subscribe to m1, publish to m2, kill m1. Remaining msgs are NOT DELIVERED
In case #3, when m1 is killed I can see the subscriber reconnecting to m2. The remaining messages are not delivered at that time though.
If I then connect the subscriber directly to s1 (using tcp:// url), the remaining msgs are indeed delivered. I would have expected s1 to route the remaining msgs to m2 during the test execution, but that did not happen.
When I "kill" the master I mean that I do "kill -9 XXXX".