Details
-
Bug
-
Status: Open
-
Urgent
-
Resolution: Unresolved
-
None
-
None
-
Critical
Description
The changes in CASSANDRA-9143 modified the way incremental repair performs by applying the following sequence of events :
- Anticompaction is executed on all replicas for all SSTables overlapping the repaired ranges
- Anticompacted SSTables are then marked as "Pending repair" and cannot be compacted anymore, nor part of another repair session
- Merkle trees are generated and compared
- Streaming takes place if needed
- Anticompaction is committed and "pending repair" table are marked as repaired if it succeeded, or they are released if the repair session failed.
If the repair coordinator dies during the streaming phase, the SSTables on the replicas will remain in "pending repair" state and will never be eligible for repair or compaction, even after all the nodes in the cluster are restarted.
Steps to reproduce (I've used Jason's 13938 branch that fixes streaming errors) :
ccm create inc-repair-issue -v github:jasobrown/13938 -n 3 # Allow jmx access and remove all rpc_ settings in yaml for f in ~/.ccm/inc-repair-issue/node*/conf/cassandra-env.sh; do sed -i'' -e 's/com.sun.management.jmxremote.authenticate=true/com.sun.management.jmxremote.authenticate=false/g' $f done for f in ~/.ccm/inc-repair-issue/node*/conf/cassandra.yaml; do grep -v "rpc_" $f > ${f}.tmp cat ${f}.tmp > $f done ccm start
I used tlp-stress to generate a few 10s of MBs of data (killed it after some time). Obviously cassandra-stress works as well :
bin/tlp-stress run BasicTimeSeries -i 1M -p 1M -t 2 --rate 5000 --replication "{'class':'SimpleStrategy', 'replication_factor':2}" --compaction "{'class': 'SizeTieredCompactionStrategy'}" --host 127.0.0.1
Flush and delete all SSTables in node1 :
ccm node1 nodetool flush ccm node1 stop rm -f ~/.ccm/inc-repair-issue/node1/data0/tlp_stress/sensor*/*.* ccm node1 start
Then throttle streaming throughput to 1MB/s so we have time to take node1 down during the streaming phase and run repair:
ccm node1 nodetool setstreamthroughput 1 ccm node2 nodetool setstreamthroughput 1 ccm node3 nodetool setstreamthroughput 1 ccm node1 nodetool repair tlp_stress
Once streaming starts, shut down node1 and start it again :
ccm node1 stop ccm node1 start
Run repair again :
ccm node1 nodetool repair tlp_stress
The command will return very quickly, showing that it skipped all sstables :
[2018-08-31 19:05:16,292] Repair completed successfully [2018-08-31 19:05:16,292] Repair command #1 finished in 2 seconds $ ccm node1 nodetool status Datacenter: datacenter1 ======================= Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns Host ID Rack UN 127.0.0.1 228,64 KiB 256 ? 437dc9cd-b1a1-41a5-961e-cfc99763e29f rack1 UN 127.0.0.2 60,09 MiB 256 ? fbcbbdbb-e32a-4716-8230-8ca59aa93e62 rack1 UN 127.0.0.3 57,59 MiB 256 ? a0b1bcc6-0fad-405a-b0bf-180a0ca31dd0 rack1
sstablemetadata will then show that nodes 2 and 3 have SSTables still in "pending repair" state :
~/.ccm/repository/gitCOLONtrunk/tools/bin/sstablemetadata na-4-big-Data.db | grep repair SSTable: /Users/adejanovski/.ccm/inc-repair-4.0/node2/data0/tlp_stress/sensor_data-b7375660ad3111e8a0e59357ff9c9bda/na-4-big Pending repair: 3844a400-ad33-11e8-b5a7-6b8dd8f31b62
Restarting these nodes wouldn't help either.