Uploaded image for project: 'Apache Cassandra'
  1. Apache Cassandra
  2. CASSANDRA-14559

Check for endpoint collision with hibernating nodes

    XMLWordPrintableJSON

Details

    Description

      I ran across an edge case when replacing a node with the same address. This issue results in the node(and its tokens) being unsafely removed from gossip.

      Steps to replicate:

      1. Create 3 node cluster.
      2. Stop a node
      3. Replace the stopped node with a node using the same address using the replace_address flag
      4. Stop the node before it finishes bootstrapping
      5. Remove the replace_address flag and restart the node to resume bootstrapping (if the data dir is also cleared at this point the node will also generate new tokens when it starts)
      6. Stop the node before it finishes bootstrapping again
      7. 30 Seconds later the node will be removed from gossip because it now matches the check for a FatClient

      I think this is only an issue when replacing a node with the same address because other replacements now use STATUS_BOOTSTRAPPING_REPLACE and leave the dead node unchanged.

      I believe the simplest fix for this is to add a check that prevents a non-bootstrapped node (without the replaces_address flag) starting if there is a gossip entry for the same address in the hibernate state.

      3.11 PoC

       

      Attachments

        Issue Links

          Activity

            People

              stefan.miklosovic Stefan Miklosovic
              VincentWhite Vincent White
              Stefan Miklosovic
              Brandon Williams
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 1h 10m
                  1h 10m