Created attachment 22166 [details] Demonstrate view inconsistency. In a four node cluster, using NonBlockingCoordinator, if two nodes fail at the same time, the remaining two nodes get different views and never converge. When the other nodes restart, they never install a view at all. I've attached the relevant demo code. Run it on 4 machines, wait for view installation, then CTRL-C two of them. The other two will never print the same UniqueId. Start a new node, view is always null. Immediately after the two node failure, one of the surviving nodes issues this stack trace; WARN - Member send is failing for:tcp://{-64, -88, -91, 34}:4000 ; Setting to su spect and retrying. ERROR - Error processing coordination message. Could be fatal. org.apache.catalina.tribes.ChannelException: Send failed, attempt:2 max:1; Fault y members:tcp://{-64, -88, -91, 34}:4000; at org.apache.catalina.tribes.transport.nio.ParallelNioSender.doLoop(Par allelNioSender.java:172) at org.apache.catalina.tribes.transport.nio.ParallelNioSender.sendMessag e(ParallelNioSender.java:78) at org.apache.catalina.tribes.transport.nio.PooledParallelSender.sendMes sage(PooledParallelSender.java:53) at org.apache.catalina.tribes.transport.ReplicationTransmitter.sendMessa ge(ReplicationTransmitter.java:80) at org.apache.catalina.tribes.group.ChannelCoordinator.sendMessage(Chann elCoordinator.java:78) at org.apache.catalina.tribes.group.ChannelInterceptorBase.sendMessage(C hannelInterceptorBase.java:75) at org.apache.catalina.tribes.group.interceptors.NonBlockingCoordinator. handleMyToken(NonBlockingCoordina
So, I understand this better now and have a proposed fix. Here's the procedure to reproduce the problem. 1) start four nodes. 2) see a view installation with four members. 3) kill two non-coordinator nodes in quick succession (a second or two) From this point onwards, until it is killed, the coordinator is oscillating between two states. It recognizes that the state is inconsistent as it receives heartbeats from the the other node and the UniqueId's of its view does not match the coordinator. It then forces an election. Which fails as it believes an election is already running. This cycle repeats forever. When the first node crashed, memberDisappeared() is called on the coordinator. It then starts sending messages as part of an election. A method throws here with a connection timeout (it was attempting to send to the second node, which just crashed). It never handles this case, leaving the 'election in progress' flag on. Forever. Clearing suggestedViewId when the ChannelException is thrown is the fix; @@ -500,6 +500,7 @@ public class NonBlockingCoordinator extends ChannelInterceptorBase { processCoordMessage(cmsg, msg.getAddress()); }catch ( ChannelException x ) { log.error("Error processing coordination message. Could be fatal.",x); + suggestedviewId = null; } this probably should only be done under some circumstances, so this isn't obviously a safe patch. Hopefully the author will have a better fix!
hi Rob, the non blocking coordinator is still work in progress. Its one piece of code that got a bit over complicated once I started developing it, and I think it can be greatly simplified I will take a look at this beginning of next week Filip
I made my own coordinator which simply uses a sorted list of getMembers() + getLocalMember(), though it only installs views if the membership remains unchanged for a few seconds to avoid a little storm of view changes. Obviously it's a much weaker form of view management than your attempting, but it's probably good enough for my purposes. Let me know when you get to this, I can test it out.
Created attachment 22179 [details] An alternative coordinator that makes local decisions based on membership service Happy to release this class under the Apache License. Let me know what you need from me.
Just submit a http://www.apache.org/licenses/icla.txt and email a scanned copy to secretary [at) apache [dot] org
For a contribution of a single class, the statement in comment #4 is more than enough. No need for a CLA.
Many thanks for the patch. I have applied it to trunk and proposed it for 6.0.x. I made the following changes: - changed package to org.apache.catalina.tribes.group.interceptors - changed class name to SimpleCoordinator - added the AL2 text to the beginning of the file
Thanks. I have since moved on to use a custom stack for group membership. I found an excellent paper which describes a robust mechanism for leader election. The paper also extends that algorithm to make a robust group membership protocol too. http://citeseer.ist.psu.edu/old/496213.html
An updated patch is always welcome.
My comment was misleading. The "custom stack" in question is not based on Tribes at all.
This has been applied to 6.0.x and will be included in 6.0.19 onwards.