Uploaded image for project: 'Cassandra'
  1. Cassandra
  2. CASSANDRA-14400

Subrange repair doesn't always mark as repaired

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Normal
    • Resolution: Not A Problem
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None
    • Severity:
      Normal
    • Since Version:
      4.0

      Description

      So was just messing around with subrange repair on trunk and found that if I generated an SSTable with a single token and then tried to repair that SSTable using subrange repairs it wouldn't get marked as repaired.
       
       Before repair:

      First token: -9223362383595311662 (derphead4471291)
      Last token: -9223362383595311662 (derphead4471291)
      Repaired at: 0
      Pending repair: 862395e0-4394-11e8-8f20-3b8ee110d005
      

      Repair command:

      ccm node1 nodetool "repair -st -9223362383595311663 -et -9223362383595311661 aoeu"
      
      [2018-04-19 05:44:42,806] Starting repair command #7 (c23f76c0-4394-11e8-8f20-3b8ee110d005), repairing keyspace aoeu with repair options (parallelism: parallel, primary range: false, incremental: true, job threads: 1, ColumnFamilies: [], dataCenters: [], hosts: [], previewKind: NONE, # of ranges: 1, pull repair: false, force repair: false, optimise streams: false)
      [2018-04-19 05:44:42,843] Repair session c242d220-4394-11e8-8f20-3b8ee110d005 for range [(-9223362383595311663,-9223362383595311661]] finished (progress: 20%)
      [2018-04-19 05:44:43,139] Repair completed successfully
      [2018-04-19 05:44:43,140] Repair command #7 finished in 0 seconds
      

      After repair SSTable hasn't changed and sstablemetadata outputs:

      First token: -9223362383595311662 (derphead4471291)
      Last token: -9223362383595311662 (derphead4471291)
      Repaired at: 0
      Pending repair: 862395e0-4394-11e8-8f20-3b8ee110d005
      

      And parent_repair_history states that the repair is complete/range was successful:

      select * from system_distributed.parent_repair_history where parent_id=862395e0-4394-11e8-8f20-3b8ee110d005 ;
      
       parent_id                            | columnfamily_names | exception_message | exception_stacktrace | finished_at                     | keyspace_name | options                                                                                                                                                                                                                                                                        | requested_ranges                                | started_at                      | successful_ranges

       862395e0-4394-11e8-8f20-3b8ee110d005 |           {'aoeu'} |              null |                 null | 2018-04-19 05:43:14.578000+0000 |          aoeu | {'dataCenters': '', 'forceRepair': 'false', 'hosts': '', 'incremental': 'true', 'jobThreads': '1', 'optimiseStreams': 'false', 'parallelism': 'parallel', 'previewKind': 'NONE', 'primaryRange': 'false', 'pullRepair': 'false', 'sub_range_repair': 'true', 'trace': 'false'} | {'(-9223362383595311663,-9223362383595311661]'} | 2018-04-19 05:43:01.952000+0000 | {'(-9223362383595311663,-9223362383595311661]'}
      

      Subrange repairs seem to work fine over large ranges and set Repaired at as expected, but I haven't figured out why it works for a large range versus a small range so far.

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              KurtG Kurt Greaves
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: