Uploaded image for project: 'Cassandra'
  1. Cassandra
  2. CASSANDRA-13841

Allow specific sources during rebuild



    • Severity:


      CASSANDRA-10406 introduced the ability to rebuild specific ranges, and CASSANDRA-9875 extended that to allow specifying a set of hosts to stream from. It's not incredibly clear why you would only want to stream a subset of ranges, but a possible use case for this functionality is to rebuild a node from targeted replicas.

      When doing a DC migration, if you are using racks==RF while rebuilding you can ensure you rebuild from each copy of a replica in the source datacenter by specifying all the hosts from a single rack to rebuild a single copy from. This can be repeated for each rack in the new datacenter to ensure you have each copy of the replica from the source DC, and thus maintaining consistency through rebuilds.

      For example, with the following topology for DC A and B with an RF of A:3 and B:3

      A   B
      Node Rack Node Rack
      A1 rack1 B1 rack1
      A2 rack2 B2 rack2
      A3 rack3 B3 rack3

      The following set of actions will result in having exactly 1 copy of every replica in A in B, and B will be at least as consistent as A.

      Rebuild B1 from only A1
      Rebuild B2 from only A2
      Rebuild B3 from only A3

      Unfortunately using this functionality is non-trivial at the moment, as you can only specify specific sources WITH the nodes set of tokens to rebuild from. To perform the above with vnodes/a large cluster, you will have to specify every token range in the -ts arg, which quickly gets unwieldy/impossible if you have a large cluster.

      A solution to this is to simply filter on sources first, before processing ranges.




            • Assignee:
              KurtG Kurt Greaves
              KurtG Kurt Greaves
              Kurt Greaves
            • Votes:
              0 Vote for this issue
              3 Start watching this issue


              • Created: