Details
-
Bug
-
Status: Open
-
Normal
-
Resolution: Unresolved
-
None
-
Correctness - Consistency
-
Normal
-
Normal
-
User Report
-
All
-
None
Description
While raising a 6-node ccm cluster to test 4.0-beta4, 2 nodes chosen the same tokens using the default allocate_tokens_for_local_rf. However they both succeeded bootstrap with colliding tokens.
We were familiar with this issue from CASSANDRA-13701 and CASSANDRA-16079, and the workaround to fix this is to avoid parallel bootstrap when using allocate_tokens_for_local_rf.
However, since this is the default behavior, we should try to detect and prevent this situation when possible, since it can break users relying on parallel bootstrap behavior.
I think we could prevent this as following:
1. announce intent to bootstrap via gossip (ie. add node on gossip without token information)
2. wait for gossip to settle for a longer period (ie. ring delay)
3. allocate tokens (if multiple bootstrap attempts are detected, tie break via node-id)
4. broadcast tokens and move on with bootstrap
Attachments
Issue Links
- is caused by
-
CASSANDRA-13701 Lower default num_tokens
- Resolved
- is duplicated by
-
CASSANDRA-19644 deterministic token allocation combined with slow gossip propogation can lead to data loss
- Resolved