[CASSANDRA-15601] Ensure repaired data tracking reads a consistent amount of data across replicas - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Normal
Resolution: Fixed
Fix Version/s: 4.0-beta1, 4.0
Component/s: Consistency/Repair
Labels:
None

Bug Category:
Correctness - Transient Incorrect Response
Severity:
Normal
Complexity:
Normal
Discovered By:
Code Inspection
Platform:

All
Impacts:

None
Since Version:

4.0-alpha
Source Control Link:

https://github.com/apache/cassandra/commit/a8e7cfbc0e146ea82154654ba43b613b058f99d1
Test and Documentation Plan:

Hide

Added new in-jvm-dtests and unit tests

Show
Added new in-jvm-dtests and unit tests

Description

When generating a digest for repaired data tracking, the amount of repaired data that needs to be read may depend on the unrepaired data on the replica. As this may vary between replicas, digest mismatches can be reported even though the repaired data may actually be in sync.

For example, two replicas, A & B and a table like

CREATE TABLE t  (pk int, ck int, PRIMARY KEY (pk, ck)) WITH CLUSTERING ORDER BY ck DESC; 

Unrepaired
===========
Instance A
(0, 5)

Instance B
(0, 6)
(0, 5)


Repaired (Both A & B)
=========
(0, 4)
(0, 3)
(0, 2)
(0, 1)
(0, 0)

SELECT * FROM tbl WHERE pk = 0 LIMIT 3;

Instance A would read (0, 5) from the unrepaired set and (0, 4) (0, 3) from the repaired set.
Instance B would read (0, 6) (0, 5) from its unrepaired set and just (0, 4) from repaired data.

Unrepaired row/range/partition tombstones shadowing repaired data and present on some replicas but not others will have the opposite effect, with more repaired data being read in comparison.

To fix this, when repaired data tracking is in effect each replica needs to overread during a full data read. Replicas should read up to LIMIT (i.e. the DataLimit of the ReadCommand) from the repaired set, regardless of how much is read from the unrepaired data. At the point where that amount of repaired data has been read, replica should stop updating the digest. So if unrepaired tombstones cause more than LIMIT repaired data to be read, the digest is only calculated over the first LIMIT-worth of repaired data.

Attachments

Issue Links

relates to

CASSANDRA-14145 Detecting data resurrection during read

Resolved

Activity

People

Assignee:: Sam Tunnicliffe

Reporter:: Sam Tunnicliffe

Authors:: Sam Tunnicliffe

Reviewers:: Aleksey Yeschenko

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 26/Feb/20 14:43

Updated:: 28/Feb/24 17:13

Resolved:: 16/Apr/20 17:47