Uploaded image for project: 'Cassandra'
  1. Cassandra
  2. CASSANDRA-16182

A replacement node, although completed bootstrap and joined ring according to itself, stuck in Joining state as per the peers

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Triage Needed
    • Normal
    • Resolution: Unresolved
    • 3.0.x
    • Cluster/Gossip
    • None

    Description

      This issue occurred in a production 3.0.21 cluster.

      Here is what happened

      1. We had, say, a three node Cassandra cluster with nodes A, B and C
      2. C got "terminated by cloud provider" due to health check failure and a replacement node C' got launched.
      3. C' started bootstrapping data from its neighbors
      4. Network flaw: Nodes A,B were still able to communicate with terminated node C and consequently still have C as alive.
      5. The replacement node C' learnt about C through gossip but was unable to communicate with C and marked C as DOWN.
      6. C' completed bootstrapping successfully and itself and its peers logged this statement "Node C' will complete replacement of C for tokens [-7686143363672898397]"
      7. C' logged the statement "Nodes C' and C have the same token -7686143363672898397. C' is the new owner"
      8. C' started listening for thrift and cql clients
      9. Peer nodes A and B logged "Node C' cannot complete replacement of alive node C "
      10. A few seconds later, A and B marked C as DOWN

      C' continued to log below lines in an endless fashion

      Node C is now part of the cluster
      Nodes () and C' have the same token C.  Ignoring -7686143363672898397 (Needs a log statement fix)
      FatClient C has been silent for 30000ms, removing from gossip
      

      My reasoning of what happened:
      By the time replacement node (C') finished bootstrapping and announced it's state to Normal, A and B were still able to communicate with the replacing node C (while C' was not able to with C), and hence rejected C' replacing C. C' does not know this and does not attempt to recommunicate its "Normal" state to rest of the cluster. (Worth noting that A and B marked C as down soon after)
      Gossip keeps telling C' to add C to its metadata, and C' keeps kicking C out eventually based on FailureDetector.

      Proposed fix:
      When C' is notified through gossip about C, and given both own the same token and given C' has finished bootstrapping, C' can emit its Normal state again which should fix this in my opinion (so long as A and B have marked C as DOWN, which they did eventually)

      I ended up manually fixing this by restarting Cassandra on C', which forced it to announce its "Normal" state via
      StorageService.initServer --> joinTokenRing() --> finishJoiningRing() --> setTokens() --> setGossipTokens()
      Alternately, I could have possibly achieved the same behavior if I disabled and enabled gossip via jmx/nodetool.

      Attachments

        Activity

          People

            sumanth.pasupuleti Sumanth Pasupuleti
            sumanth.pasupuleti Sumanth Pasupuleti
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated: