[CASSANDRA-19769] CEP-15: (Accord) sequence EpochReady.coordinating to allow syncComplete to be learned from newer epochs - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Normal
Resolution: Fixed
Fix Version/s: NA
Component/s: Accord
Labels:
- pull-request-available

Bug Category:
Correctness - API / Semantic Definition
Severity:
Critical
Complexity:
Normal
Discovered By:
Fuzz Test
Platform:

All
Impacts:

None
Since Version:

NA
Source Control Link:

https://github.com/apache/cassandra/commit/a8f32d0a4c586431b7aa955ff0493166b771bcff
Test and Documentation Plan:

Hide

new tests

Show
new tests

Description

When a node is bootstrapping or doing a host replacement it sees several epochs before it actually joins the ring, but in Accord we only synchronize epoch knowledge to the nodes that have already joined; this means we won’t ever synchronize the epochs seen on the new nodes! This becomes a problem because it forces these nodes to include far more epochs than required (because they don’t know if the peers know the epoch), and may include stale epochs that are not possible to reach quorum (such 2 host replacements to the same range would cause that historic range to not be able to reach quorum).

By sequencing EpochReady.coordinating, we have the property that we only mark sync complete for epoch=N if and only if epoch=N-1 has done it as well. With this, peers are able to recover the past data when a new epoch is seen.

Attachments

Issue Links

is blocked by

CASSANDRA-19790 Add an ability to reconstruct arbitrary epoch state from the log to TCM

Resolved

is related to

CASSANDRA-19855 txns that update a static row when the desired row doesn't exist leads to an error

Resolved

CASSANDRA-19857 CommandsForRanges does not support slice which cause over returned data being sent

Resolved

CASSANDRA-19851 Create a new test kind focused on long running tests that rely on randomized input

Open

CASSANDRA-19838 Add a table to inspect the current state of a txn

Resolved

CASSANDRA-19847 Create a fuzz test that randomizes topology changes, cluster actions, and CQL operations

Resolved

CASSANDRA-19856 Add a concept for retrying messages

Resolved

links to

CI: Accord

GH: accord trunk

GH: cep-15-accord

GitHub Pull Request #103

GitHub Pull Request #3416

(2 is related to, 5 links to)

Activity

People

Assignee:: David Capwell

Reporter:: David Capwell

Authors:: David Capwell

Reviewers:: Alex Petrov, Blake Eggleston

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 12/Jul/24 00:19

Updated:: 27/Sep/24 23:25

Resolved:: 27/Sep/24 23:25

Time Tracking

Estimated:

Not Specified

Remaining:

Logged:

18h 10m