Details
-
Bug
-
Status: Resolved
-
Normal
-
Resolution: Fixed
-
None
-
None
-
Normal
Description
Read repair does not always work. At the least, we allow violation of the CL.ALL contract. To reproduce, create a three node cluster with RF=3, and json2sstable one of the attached json files on each node. This creates a row whose key is 'test' with 9 columns, but only 3 columns are on each machine. If you get_count this row in quick succession at CL.ALL, sometimes you will receive a count of 6, sometimes 9. After the ReadRepairManager has sent the repairs, you will always get 9, which is the desired behavior.
I have another data set obtained in the wild which never fully repairs for some reason, but it's a bit large to attach (600ish columns per machine.) I'm still trying to figure out why RR isn't working on this set, but I always get different results when reading at any CL including ALL, no matter how long I wait or how many reads I do.