Details
-
Bug
-
Status: Resolved
-
Low
-
Resolution: Fixed
-
None
-
None
-
Low
Description
Easy way to reproduce:
Start node A.
Start node B, with autobootstrap=false.
Kill B, wipe data dir, and restart (still w/ autobootstrap=false).
A will show B as down, with its old token. (B will see both nodes correctly.)
This appears to be because when you wipe data dir, generation restarts at 1. (This is not just operator error; besides during testing, this could arise if a node dies completely and has to be replaced.) Then gossip state is ignored until the new heartbeat is larger than the one previously reached.
It appears that initializing the generation to seconds-since-epoch would fix this.