Uploaded image for project: 'Cassandra'
  1. Cassandra
  2. CASSANDRA-14739

calculatePendingRanges when multiple concurrent range movements is unsafe

    XMLWordPrintableJSON

Details

    • Normal

    Description

      If two nodes are bootstrapped at the same time that own adjacent portions of the ring (i.e. always for nodes), they will not receive the correct data for pending writes (and perhaps not for streaming - TBC)

      By default, we don't "permit" multiple nodes to bootstrap at once, but:
       

      1. The logic we use to prevent this itself isn’t strongly consistent (or atomically applied).  If two nodes start bootstrapping close together in time, or simply get divergent gossip state, they can both believe there is no other node bootstrapping and proceed.
      2. The bug doesn’t require two nodes to actually bootstrap at the same time, there only needs to be divergent gossip state on a coordinator, so that the coordinator believes there are multiple bootstrapping, even though one of them may have completed, and they never overlapped in reality.
      3. We can bootstrap and remove nodes concurrently, I think?  I’m pretty sure this can also be unsafe, but needs some more thought.

      Attachments

        Activity

          People

            Unassigned Unassigned
            benedict Benedict Elliott Smith
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated: