Uploaded image for project: 'Cassandra'
  1. Cassandra
  2. CASSANDRA-14685

Incremental repair 4.0 : SSTables remain locked forever if the coordinator dies during streaming

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Urgent
    • Resolution: Unresolved
    • None
    • Consistency/Repair
    • None
    • Critical

    Description

      The changes in CASSANDRA-9143 modified the way incremental repair performs by applying the following sequence of events : 

      • Anticompaction is executed on all replicas for all SSTables overlapping the repaired ranges
      • Anticompacted SSTables are then marked as "Pending repair" and cannot be compacted anymore, nor part of another repair session
      • Merkle trees are generated and compared
      • Streaming takes place if needed
      • Anticompaction is committed and "pending repair" table are marked as repaired if it succeeded, or they are released if the repair session failed.

      If the repair coordinator dies during the streaming phase, the SSTables on the replicas will remain in "pending repair" state and will never be eligible for repair or compaction, even after all the nodes in the cluster are restarted. 

      Steps to reproduce (I've used Jason's 13938 branch that fixes streaming errors) : 

      ccm create inc-repair-issue -v github:jasobrown/13938 -n 3
      
      # Allow jmx access and remove all rpc_ settings in yaml
      for f in ~/.ccm/inc-repair-issue/node*/conf/cassandra-env.sh;
      do
        sed -i'' -e 's/com.sun.management.jmxremote.authenticate=true/com.sun.management.jmxremote.authenticate=false/g' $f
      done
      
      for f in ~/.ccm/inc-repair-issue/node*/conf/cassandra.yaml;
      do
        grep -v "rpc_" $f > ${f}.tmp
        cat ${f}.tmp > $f
      done
      
      ccm start
      

      I used tlp-stress to generate a few 10s of MBs of data (killed it after some time). Obviously cassandra-stress works as well :

      bin/tlp-stress run BasicTimeSeries -i 1M -p 1M -t 2 --rate 5000      --replication "{'class':'SimpleStrategy', 'replication_factor':2}"       --compaction "{'class': 'SizeTieredCompactionStrategy'}"       --host 127.0.0.1
      

      Flush and delete all SSTables in node1 :

      ccm node1 nodetool flush
      ccm node1 stop
      rm -f ~/.ccm/inc-repair-issue/node1/data0/tlp_stress/sensor*/*.*
      ccm node1 start

      Then throttle streaming throughput to 1MB/s so we have time to take node1 down during the streaming phase and run repair:

      ccm node1 nodetool setstreamthroughput 1
      ccm node2 nodetool setstreamthroughput 1
      ccm node3 nodetool setstreamthroughput 1
      ccm node1 nodetool repair tlp_stress
      

      Once streaming starts, shut down node1 and start it again :

      ccm node1 stop
      ccm node1 start
      

      Run repair again :

      ccm node1 nodetool repair tlp_stress
      

      The command will return very quickly, showing that it skipped all sstables :

      [2018-08-31 19:05:16,292] Repair completed successfully
      [2018-08-31 19:05:16,292] Repair command #1 finished in 2 seconds
      
      $ ccm node1 nodetool status
      
      Datacenter: datacenter1
      =======================
      Status=Up/Down
      |/ State=Normal/Leaving/Joining/Moving
      --  Address    Load       Tokens       Owns    Host ID                               Rack
      UN  127.0.0.1  228,64 KiB  256          ?       437dc9cd-b1a1-41a5-961e-cfc99763e29f  rack1
      UN  127.0.0.2  60,09 MiB  256          ?       fbcbbdbb-e32a-4716-8230-8ca59aa93e62  rack1
      UN  127.0.0.3  57,59 MiB  256          ?       a0b1bcc6-0fad-405a-b0bf-180a0ca31dd0  rack1
      

      sstablemetadata will then show that nodes 2 and 3 have SSTables still in "pending repair" state :

      ~/.ccm/repository/gitCOLONtrunk/tools/bin/sstablemetadata na-4-big-Data.db | grep repair
      SSTable: /Users/adejanovski/.ccm/inc-repair-4.0/node2/data0/tlp_stress/sensor_data-b7375660ad3111e8a0e59357ff9c9bda/na-4-big
      Pending repair: 3844a400-ad33-11e8-b5a7-6b8dd8f31b62
      

      Restarting these nodes wouldn't help either.

      Attachments

        Activity

          People

            jasobrown Jason Brown
            adejanovski Alexander Dejanovski
            Jason Brown
            Blake Eggleston
            Votes:
            1 Vote for this issue
            Watchers:
            9 Start watching this issue

            Dates

              Created:
              Updated: