Details
-
Bug
-
Status: Patch Available
-
Low
-
Resolution: Unresolved
-
None
-
Low
Description
CASSANDRA-10406 introduced the ability to rebuild specific ranges, and CASSANDRA-9875 extended that to allow specifying a set of hosts to stream from. It's not incredibly clear why you would only want to stream a subset of ranges, but a possible use case for this functionality is to rebuild a node from targeted replicas.
When doing a DC migration, if you are using racks==RF while rebuilding you can ensure you rebuild from each copy of a replica in the source datacenter by specifying all the hosts from a single rack to rebuild a single copy from. This can be repeated for each rack in the new datacenter to ensure you have each copy of the replica from the source DC, and thus maintaining consistency through rebuilds.
For example, with the following topology for DC A and B with an RF of A:3 and B:3
A | B | ||
---|---|---|---|
Node | Rack | Node | Rack |
A1 | rack1 | B1 | rack1 |
A2 | rack2 | B2 | rack2 |
A3 | rack3 | B3 | rack3 |
The following set of actions will result in having exactly 1 copy of every replica in A in B, and B will be at least as consistent as A.
Rebuild B1 from only A1 Rebuild B2 from only A2 Rebuild B3 from only A3
Unfortunately using this functionality is non-trivial at the moment, as you can only specify specific sources WITH the nodes set of tokens to rebuild from. To perform the above with vnodes/a large cluster, you will have to specify every token range in the -ts arg, which quickly gets unwieldy/impossible if you have a large cluster.
A solution to this is to simply filter on sources first, before processing ranges.