Uploaded image for project: 'Cassandra'
  1. Cassandra
  2. CASSANDRA-1316

Read repair does not always work correctly

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Normal
    • Resolution: Fixed
    • 0.6.4
    • None
    • None
    • Normal

    Description

      Read repair does not always work. At the least, we allow violation of the CL.ALL contract. To reproduce, create a three node cluster with RF=3, and json2sstable one of the attached json files on each node. This creates a row whose key is 'test' with 9 columns, but only 3 columns are on each machine. If you get_count this row in quick succession at CL.ALL, sometimes you will receive a count of 6, sometimes 9. After the ReadRepairManager has sent the repairs, you will always get 9, which is the desired behavior.

      I have another data set obtained in the wild which never fully repairs for some reason, but it's a bit large to attach (600ish columns per machine.) I'm still trying to figure out why RR isn't working on this set, but I always get different results when reading at any CL including ALL, no matter how long I wait or how many reads I do.

      Attachments

        1. cassandra-1.json
          0.1 kB
          Brandon Williams
        2. cassandra-2.json
          0.1 kB
          Brandon Williams
        3. cassandra-3.json
          0.1 kB
          Brandon Williams
        4. 001_correct_responsecount_in_RRR.txt
          1 kB
          Brandon Williams
        5. RRR-v2.txt
          3 kB
          Jonathan Ellis
        6. 1316-RRM.txt
          7 kB
          Jonathan Ellis

        Activity

          People

            brandon.williams Brandon Williams
            brandon.williams Brandon Williams
            Brandon Williams
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: