Uploaded image for project: 'Cassandra'
  1. Cassandra
  2. CASSANDRA-7069

Prevent operator mistakes due to simultaneous bootstrap

    Details

    • Type: New Feature
    • Status: Resolved
    • Priority: Minor
    • Resolution: Fixed
    • Fix Version/s: 2.1.1, 2.2.0 beta 1
    • Component/s: None
    • Labels:
      None

      Description

      Cassandra has always had the '2 minute rule' between beginning topology changes to ensure the range announcement is known to all nodes before the next one begins. Trying to bootstrap a bunch of nodes simultaneously is a common mistake and seems to be on the rise as of late.

      We can prevent users from shooting themselves in the foot this way by looking for other joining nodes in the shadow round, then comparing their generation against our own and if there isn't a large enough difference, bail out or sleep until it is large enough.

      1. 7069.txt
        1 kB
        Brandon Williams

        Issue Links

          Activity

          Hide
          tjake T Jake Luciani added a comment -

          CASSANDRA-2434 will require one node at a time no?

          Show
          tjake T Jake Luciani added a comment - CASSANDRA-2434 will require one node at a time no?
          Hide
          brandon.williams Brandon Williams added a comment -

          Without looking at that patch, will it gracefully handle starting a bunch of nodes up in bootstrap mode at once?

          Show
          brandon.williams Brandon Williams added a comment - Without looking at that patch, will it gracefully handle starting a bunch of nodes up in bootstrap mode at once?
          Hide
          tjake T Jake Luciani added a comment -

          It will not throw an error but it defeats the purpose of the ticket. I need to think about it deeper, but if two nodes are bootstrapping and fall in the bounds of the original token range then you would, in the end, not have a consistent bootstrap.

          Show
          tjake T Jake Luciani added a comment - It will not throw an error but it defeats the purpose of the ticket. I need to think about it deeper, but if two nodes are bootstrapping and fall in the bounds of the original token range then you would, in the end, not have a consistent bootstrap.
          Hide
          brandon.williams Brandon Williams added a comment -

          It will not throw an error but it defeats the purpose of the ticket

          We know from experience that telling people "don't do that" isn't good enough... what I'm proposing here is to either not allow it, or sleep long enough that it avoids any issues.

          Show
          brandon.williams Brandon Williams added a comment - It will not throw an error but it defeats the purpose of the ticket We know from experience that telling people "don't do that" isn't good enough... what I'm proposing here is to either not allow it, or sleep long enough that it avoids any issues.
          Hide
          tjake T Jake Luciani added a comment -

          Right, I agree. What I'm saying it we may need to error any simultaneous bootstraps, they would need to happen fully one at a time.

          Honestly I don't understand how the "shadow" round works well enough to know if two bootstraps placed N minutes apart will end up with consistency issues ala 2434 but I suspect it would be an issue.

          Show
          tjake T Jake Luciani added a comment - Right, I agree. What I'm saying it we may need to error any simultaneous bootstraps, they would need to happen fully one at a time. Honestly I don't understand how the "shadow" round works well enough to know if two bootstraps placed N minutes apart will end up with consistency issues ala 2434 but I suspect it would be an issue.
          Hide
          rcoli Robert Coli added a comment -

          We know from experience that telling people "don't do that" isn't good enough... what I'm proposing here is to either not allow it, or sleep long enough that it avoids any issues.

          +1 this, a lot.

          Show
          rcoli Robert Coli added a comment - We know from experience that telling people "don't do that" isn't good enough... what I'm proposing here is to either not allow it, or sleep long enough that it avoids any issues. +1 this, a lot.
          Hide
          brandon.williams Brandon Williams added a comment -

          Patch to refuse bootstrapping while other range movements are occurring and cassandra.consistent.rangemovement is true.

          Show
          brandon.williams Brandon Williams added a comment - Patch to refuse bootstrapping while other range movements are occurring and cassandra.consistent.rangemovement is true.
          Hide
          tjake T Jake Luciani added a comment -

          +1

          Show
          tjake T Jake Luciani added a comment - +1
          Hide
          tjake T Jake Luciani added a comment -

          Why not 2.1.1?

          Show
          tjake T Jake Luciani added a comment - Why not 2.1.1?
          Hide
          brandon.williams Brandon Williams added a comment -

          I'm just overly cautious until I make the patch Committed to 2.1.1

          Show
          brandon.williams Brandon Williams added a comment - I'm just overly cautious until I make the patch Committed to 2.1.1
          Hide
          brandon.williams Brandon Williams added a comment -

          Hmm, it just occurred to me this prevents bootstrapping even after the two minute rule has been followed.

          Show
          brandon.williams Brandon Williams added a comment - Hmm, it just occurred to me this prevents bootstrapping even after the two minute rule has been followed.
          Hide
          tjake T Jake Luciani added a comment -

          Perhaps a dtest then

          Show
          tjake T Jake Luciani added a comment - Perhaps a dtest then
          Hide
          brandon.williams Brandon Williams added a comment -

          Wait, can we even do multiple bootstraps following the 2 minute rule and get consistent range movement?

          Show
          brandon.williams Brandon Williams added a comment - Wait, can we even do multiple bootstraps following the 2 minute rule and get consistent range movement?
          Hide
          brandon.williams Brandon Williams added a comment -

          Disccusing this offline with Jake, we decided it's still possible to violate consistent range movement even following the 2 minute rule, so leaving this as-is. If people don't care, they can simply disable consistent range movement.

          Show
          brandon.williams Brandon Williams added a comment - Disccusing this offline with Jake, we decided it's still possible to violate consistent range movement even following the 2 minute rule, so leaving this as-is. If people don't care, they can simply disable consistent range movement.
          Hide
          jbellis Jonathan Ellis added a comment -

          Could LWT make bootstrap safer?

          Show
          jbellis Jonathan Ellis added a comment - Could LWT make bootstrap safer?
          Hide
          brandon.williams Brandon Williams added a comment -

          I kind of doubt that's worth the effort, since we'd still end up processing the bootstraps serially, unless Jake has some clever idea.

          Show
          brandon.williams Brandon Williams added a comment - I kind of doubt that's worth the effort, since we'd still end up processing the bootstraps serially, unless Jake has some clever idea.
          Hide
          Anthony Grasso Anthony Grasso added a comment - - edited

          Disccusing this offline with Jake, we decided it's still possible to violate consistent range movement even following the 2 minute rule, so leaving this as-is. If people don't care, they can simply disable consistent range movement.

          Brandon Williams and T Jake Luciani just wondering how you both worked out it was possible to violate consistent range movement even after following the 2 minute rule? Is it possible for tokens assigned to a first bootstrapping node to then be reassigned to a second bootstrapping node?

          Show
          Anthony Grasso Anthony Grasso added a comment - - edited Disccusing this offline with Jake, we decided it's still possible to violate consistent range movement even following the 2 minute rule, so leaving this as-is. If people don't care, they can simply disable consistent range movement. Brandon Williams and T Jake Luciani just wondering how you both worked out it was possible to violate consistent range movement even after following the 2 minute rule? Is it possible for tokens assigned to a first bootstrapping node to then be reassigned to a second bootstrapping node?

            People

            • Assignee:
              brandon.williams Brandon Williams
              Reporter:
              brandon.williams Brandon Williams
              Reviewer:
              T Jake Luciani
              Tester:
              Philip Thompson
            • Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development