[CASSANDRA-18824] Backport CASSANDRA-16418: Cleanup behaviour during node decommission caused missing replica - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Normal
Resolution: Fixed
Fix Version/s: 3.0.30, 3.11.17, 4.0.13, 4.1.4, 5.0-rc1, 5.0, 5.1
Component/s: Consistency/Bootstrap and Decommission
Labels:
None

Bug Category:
Correctness - Unrecoverable Corruption / Loss
Platform:

All
Since Version:

3.0.0
Source Control Link:

https://github.com/apache/cassandra/commit/5be57829b03ef980933ba52ecc0549787f653da4
Test and Documentation Plan:

Hide

CI, dtest

Show
CI, dtest

Description

Node decommission triggers data transfer to other nodes. While this transfer is in progress,
receiving nodes temporarily hold token ranges in a pending state. However, the cleanup process currently doesn't consider these pending ranges when calculating token ownership.
As a consequence, data that is already stored in sstables gets inadvertently cleaned up.

STR:

Create two node cluster
Create keyspace with RF=1
Insert sample data (assert data is available when querying both nodes)
Start decommission process of node 1
Start running cleanup in a loop on node 2 until decommission on node 1 finishes
Verify of all rows are in the cluster - it will fail as the previous step removed some of the rows

It seems that the cleanup process does not take into account the pending ranges, it uses only the local ranges - https://github.com/apache/cassandra/blob/caad2f24f95b494d05c6b5d86a8d25fbee58d7c2/src/java/org/apache/cassandra/db/compaction/CompactionManager.java#L466.

There are two solutions to the problem.

One would be to change the cleanup process in a way that it start taking pending ranges into account. Even thought it might sound tempting at first it will require involving changes and a lot of testing effort.

Alternatively we could interrupt/prevent the cleanup process from running when any pending range on a node is detected. That sounds like a reasonable alternative to the problem and something that is relatively easy to implement.

The bug has been already fixed in 4.x with ~~CASSANDRA-16418~~, the goal of this ticket is to backport it to 3.x.

Attachments

Issue Links

Blocked

CASSANDRA-18823 Cleanup behaviour during node decommission caused missing replica

Resolved

CASSANDRA-16418 Unsafe to run nodetool cleanup during bootstrap or decommission

Resolved

is related to

CASSANDRA-18863 Some tests are depending on each other

Triage Needed

Testing discovered

CASSANDRA-19363 Weird data loss in 3.11 flakiness during decommission

Triage Needed

CASSANDRA-19364 Data loss during decommission possible due to a delayed and unsynced pending ranges calculation

Triage Needed

links to

GitHub Pull Request #2921

GitHub Pull Request #3022

GitHub Pull Request #3023

GitHub Pull Request #3024

GitHub Pull Request #3025

GitHub Pull Request #3026

GitHub Pull Request #3027

(7 links to)

Activity

People

Assignee:: Szymon Miezal

Reporter:: Szymon Miezal

Authors:: Szymon Miezal

Reviewers:: Brandon Williams, Jacek Lewandowski

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Dates

Created:: 06/Sep/23 08:22

Updated:: 11/Apr/24 11:37

Resolved:: 07/Feb/24 14:20

Time Tracking

Estimated:

Not Specified

Remaining:

Logged:

2h 40m