Details
-
Bug
-
Status: Closed
-
Blocker
-
Resolution: Fixed
-
2.0.4-alpha
-
None
-
Reviewed
Description
We saw the following sequence of events in a cluster result in losing the most recent genstamp of a block:
- client is writing to a pipeline of 3
- the pipeline had nodes fail over some period of time, such that it left 3 old-genstamp replicas on the original three nodes, having recruited 3 new replicas with a later genstamp.
- so, we have 6 total replicas in the cluster, three with old genstamps on downed nodes, and 3 with the latest genstamp
- cluster reboots, and the nodes with old genstamps blockReport first. The replicas are correctly added to the corrupt replicas map since they have a too-old genstamp
- the nodes with the new genstamp block report. When the latest one block reports, chooseExcessReplicates is called and incorrectly decides to remove the three good replicas, leaving only the old-genstamp replicas.