Details
-
Bug
-
Status: Resolved
-
Normal
-
Resolution: Fixed
-
Correctness - API / Semantic Definition
-
Critical
-
Normal
-
Fuzz Test
-
All
-
None
-
Description
When a node is bootstrapping or doing a host replacement it sees several epochs before it actually joins the ring, but in Accord we only synchronize epoch knowledge to the nodes that have already joined; this means we won’t ever synchronize the epochs seen on the new nodes! This becomes a problem because it forces these nodes to include far more epochs than required (because they don’t know if the peers know the epoch), and may include stale epochs that are not possible to reach quorum (such 2 host replacements to the same range would cause that historic range to not be able to reach quorum).
By sequencing EpochReady.coordinating, we have the property that we only mark sync complete for epoch=N if and only if epoch=N-1 has done it as well. With this, peers are able to recover the past data when a new epoch is seen.
Attachments
Issue Links
- is blocked by
-
CASSANDRA-19790 Add an ability to reconstruct arbitrary epoch state from the log to TCM
- Resolved
- is related to
-
CASSANDRA-19855 txns that update a static row when the desired row doesn't exist leads to an error
- Resolved
-
CASSANDRA-19857 CommandsForRanges does not support slice which cause over returned data being sent
- Resolved
-
CASSANDRA-19851 Create a new test kind focused on long running tests that rely on randomized input
- Open
-
CASSANDRA-19838 Add a table to inspect the current state of a txn
- Resolved
-
CASSANDRA-19847 Create a fuzz test that randomizes topology changes, cluster actions, and CQL operations
- Resolved
-
CASSANDRA-19856 Add a concept for retrying messages
- Resolved
- links to