[CASSANDRA-16182] A replacement node, although completed bootstrap and joined ring according to itself, stuck in Joining state as per the peers - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Triage Needed
Priority: Normal
Resolution: Unresolved
Fix Version/s: 3.0.x
Component/s: Cluster/Gossip
Labels:
None

Description

This issue occurred in a production 3.0.21 cluster.

Here is what happened

We had, say, a three node Cassandra cluster with nodes A, B and C
C got "terminated by cloud provider" due to health check failure and a replacement node C' got launched.
C' started bootstrapping data from its neighbors
Network flaw: Nodes A,B were still able to communicate with terminated node C and consequently still have C as alive.
The replacement node C' learnt about C through gossip but was unable to communicate with C and marked C as DOWN.
C' completed bootstrapping successfully and itself and its peers logged this statement "Node C' will complete replacement of C for tokens [-7686143363672898397]"
C' logged the statement "Nodes C' and C have the same token -7686143363672898397. C' is the new owner"
C' started listening for thrift and cql clients
Peer nodes A and B logged "Node C' cannot complete replacement of alive node C "
A few seconds later, A and B marked C as DOWN

C' continued to log below lines in an endless fashion

Node C is now part of the cluster
Nodes () and C' have the same token C.  Ignoring -7686143363672898397 (Needs a log statement fix)
FatClient C has been silent for 30000ms, removing from gossip

My reasoning of what happened:
By the time replacement node (C') finished bootstrapping and announced it's state to Normal, A and B were still able to communicate with the replacing node C (while C' was not able to with C), and hence rejected C' replacing C. C' does not know this and does not attempt to recommunicate its "Normal" state to rest of the cluster. (Worth noting that A and B marked C as down soon after)
Gossip keeps telling C' to add C to its metadata, and C' keeps kicking C out eventually based on FailureDetector.

Proposed fix:
When C' is notified through gossip about C, and given both own the same token and given C' has finished bootstrapping, C' can emit its Normal state again which should fix this in my opinion (so long as A and B have marked C as DOWN, which they did eventually)

I ended up manually fixing this by restarting Cassandra on C', which forced it to announce its "Normal" state via
StorageService.initServer --> joinTokenRing() --> finishJoiningRing() --> setTokens() --> setGossipTokens()
Alternately, I could have possibly achieved the same behavior if I disabled and enabled gossip via jmx/nodetool.

Attachments

Activity

People

Assignee:: Sumanth Pasupuleti

Reporter:: Sumanth Pasupuleti

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Dates

Created:: 05/Oct/20 16:16

Updated:: 07/Oct/20 18:49