45261 – Concurrent node failure leads to inconsistent views.

Bug 45261 - Concurrent node failure leads to inconsistent views.

Summary: Concurrent node failure leads to inconsistent views.

Status:	RESOLVED FIXED

Alias:	None

Product:	Tomcat 6
Classification:	Unclassified
Component:	Cluster (show other bugs)
Version:	6.0.16
Hardware:	PC Linux

Importance:	P2 normal (vote)
Target Milestone:	default
Assignee:	Tomcat Developers Mailing List

URL:
Keywords:

Depends on:
Blocks:

Reported:	2008-06-23 14:13 UTC by Robert Newson
Modified:	2014-02-17 13:56 UTC (History)
CC List:	0 users

Attachments
Demonstrate view inconsistency. (848 bytes, text/x-java) 2008-06-23 14:13 UTC, Robert Newson	Details
An alternative coordinator that makes local decisions based on membership service (2.18 KB, text/x-java) 2008-06-26 10:16 UTC, Robert Newson	Details
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Robert Newson 2008-06-23 14:13:00 UTC

Created attachment 22166 [details]
Demonstrate view inconsistency.

In a four node cluster, using NonBlockingCoordinator, if two nodes fail at the same time, the remaining two nodes get different views and never converge.

When the other nodes restart, they never install a view at all.

I've attached the relevant demo code. Run it on 4 machines, wait for view installation, then CTRL-C two of them. The other two will never print the same UniqueId. Start a new node, view is always null.

Immediately after the two node failure, one of the surviving nodes issues this stack trace;

WARN - Member send is failing for:tcp://{-64, -88, -91, 34}:4000 ; Setting to su
spect and retrying.
ERROR - Error processing coordination message. Could be fatal.
org.apache.catalina.tribes.ChannelException: Send failed, attempt:2 max:1; Fault
y members:tcp://{-64, -88, -91, 34}:4000; 
        at org.apache.catalina.tribes.transport.nio.ParallelNioSender.doLoop(Par
allelNioSender.java:172)
        at org.apache.catalina.tribes.transport.nio.ParallelNioSender.sendMessag
e(ParallelNioSender.java:78)
        at org.apache.catalina.tribes.transport.nio.PooledParallelSender.sendMes
sage(PooledParallelSender.java:53)
        at org.apache.catalina.tribes.transport.ReplicationTransmitter.sendMessa
ge(ReplicationTransmitter.java:80)
        at org.apache.catalina.tribes.group.ChannelCoordinator.sendMessage(Chann
elCoordinator.java:78)
        at org.apache.catalina.tribes.group.ChannelInterceptorBase.sendMessage(C
hannelInterceptorBase.java:75)
        at org.apache.catalina.tribes.group.interceptors.NonBlockingCoordinator.
handleMyToken(NonBlockingCoordina

Comment 1 Robert Newson 2008-06-25 13:50:40 UTC

So, I understand this better now and have a proposed fix.

Here's the procedure to reproduce the problem.

1) start four nodes.
2) see a view installation with four members.
3) kill two non-coordinator nodes in quick succession (a second or two)

From this point onwards, until it is killed, the coordinator is oscillating between two states. It recognizes that the state is inconsistent as it receives heartbeats from the the other node and the UniqueId's of its view does not match the coordinator. It then forces an election. Which fails as it believes an election is already running. This cycle repeats forever.

When the first node crashed, memberDisappeared() is called on the coordinator. It then starts sending messages as part of an election. A method throws here with a connection timeout (it was attempting to send to the second node, which just crashed). It never handles this case, leaving the 'election in progress' flag on. Forever.

Clearing suggestedViewId when the ChannelException is thrown is the fix;

@@ -500,6 +500,7 @@ public class NonBlockingCoordinator extends ChannelInterceptorBase {
                 processCoordMessage(cmsg, msg.getAddress());
             }catch ( ChannelException x ) {
                 log.error("Error processing coordination message. Could be fatal.",x);
+                suggestedviewId = null;                
             }

this probably should only be done under some circumstances, so this isn't obviously a safe patch. Hopefully the author will have a better fix!

Comment 2 Filip Hanik 2008-06-26 08:05:07 UTC

hi Rob, 
the non blocking coordinator is still work in progress. Its one piece of code that got a bit over complicated once I started developing it, and I think it can be greatly simplified

I will take a look at this beginning of next week

Filip

Comment 3 Robert Newson 2008-06-26 08:16:13 UTC

I made my own coordinator which simply uses a sorted list of getMembers() + getLocalMember(), though it only installs views if the membership remains unchanged for a few seconds to avoid a little storm of view changes. Obviously it's a much weaker form of view management than your attempting, but it's probably good enough for my purposes.

Let me know when you get to this, I can test it out.

Comment 4 Robert Newson 2008-06-26 10:16:13 UTC

Created attachment 22179 [details]
An alternative coordinator that makes local decisions based on membership service


Happy to release this class under the Apache License. Let me know what you need from me.

Comment 5 Filip Hanik 2008-06-26 10:38:33 UTC

Just submit a 
http://www.apache.org/licenses/icla.txt

and email a scanned copy to 
secretary [at) apache [dot] org

Comment 6 Mark Thomas 2008-10-01 08:46:18 UTC

For a contribution of a single class, the statement in comment #4 is more than enough. No need for a CLA.

Comment 7 Mark Thomas 2008-12-28 16:35:29 UTC

Many thanks for the patch. I have applied it to trunk and proposed it for 6.0.x. I made the following changes:
- changed package to org.apache.catalina.tribes.group.interceptors
- changed class name to SimpleCoordinator
- added the AL2 text to the beginning of the file

Comment 8 Robert Newson 2008-12-30 06:35:08 UTC

Thanks. I have since moved on to use a custom stack for group membership. I found an excellent paper which describes a robust mechanism for leader election. The paper also extends that algorithm to make a robust group membership protocol too.

http://citeseer.ist.psu.edu/old/496213.html

Comment 9 Mark Thomas 2008-12-30 09:59:01 UTC

An updated patch is always welcome.

Comment 10 Robert Newson 2008-12-30 11:00:48 UTC

My comment was misleading. The "custom stack" in question is not based on Tribes at all.

Comment 11 Mark Thomas 2009-01-14 16:16:05 UTC

This has been applied to 6.0.x and will be included in 6.0.19 onwards.