Details
-
Improvement
-
Status: Resolved
-
Normal
-
Resolution: Fixed
-
None
Description
The impetus here is this: a node that was down for some period and comes back can serve stale information. We know from CASSANDRA-768 that we can't just wait for hints, and know that tangentially related CASSANDRA-3569 prevents us from having the node in a down (from the FD's POV) state handle streaming.
We can almost set join_ring to false, then repair, and then join the ring to narrow the window (actually, you can do this and everything succeeds because the node doesn't know it's a member yet, which is probably a bit of a bug.) If instead we modified this to put the node in hibernate, like replace_address does, it could work almost like replace, except you could run a repair (manually) while in the hibernate state, and then flip to normal when it's done.
This won't prevent the staleness 100%, but it will greatly reduce the chance if the node has been down a significant amount of time.