Details

    • Improvement
    • Status: Resolved
    • Normal
    • Resolution: Duplicate
    • None
    • None

    Description

      jbellis mentioned this as a potential improvement in his 2013 committer meeting notes (http://grokbase.com/t/cassandra/dev/132s6sh415/notes-from-committers-meeting-streaming-and-repair): "making the repair coordinator smarter to know when to avoid duplicate streaming. E.g., if replicas A and B have row X, but C does not, currently both A and B will stream to C."

      I tested in C* 3.0.6 and looks like this is still happening. Basically on a 3-node cluster I inserted into a trivial table under a keyspace with RF=3 and forced two flushes on all nodes so that I have two SSTables on each node, then I shutdown the 1st node and removed one SSTable from its data directory and restarted the node. I connected cqlsh to this node and verified that with CL.ONE the data is indeed missing; I now moved onto the 2nd node running a "nodetool repair <keyspace> <table>", here are what I observed from system.log on the 2nd node (as repair coordinator):

      INFO  [Thread-47] 2016-06-06 23:19:54,173  RepairRunnable.java:125 - Starting repair command #1, repairing keyspace weitest with repair options (parallelism: parallel, primary range: false, incremental: true, job threads: 1, ColumnFamilies: [songs], dataCenters: [], hosts: [], # of ranges: 3)
      INFO  [Thread-47] 2016-06-06 23:19:54,253  RepairSession.java:237 - [repair #2d177cc0-2c3d-11e6-94d2-b35b6c93de57] new session: will sync /172.31.44.75, /172.31.40.215, /172.31.36.148 on range [(3074457345618258602,-9223372036854775808], (-9223372036854775808,-3074457345618258603], (-3074457345618258603,3074457345618258602]] for weitest.[songs]
      INFO  [Repair#1:1] 2016-06-06 23:19:54,268  RepairJob.java:172 - [repair #2d177cc0-2c3d-11e6-94d2-b35b6c93de57] Requesting merkle trees for songs (to [/172.31.40.215, /172.31.36.148, /172.31.44.75])
      INFO  [AntiEntropyStage:1] 2016-06-06 23:19:54,335  RepairSession.java:181 - [repair #2d177cc0-2c3d-11e6-94d2-b35b6c93de57] Received merkle tree for songs from /172.31.40.215
      INFO  [AntiEntropyStage:1] 2016-06-06 23:19:54,427  RepairSession.java:181 - [repair #2d177cc0-2c3d-11e6-94d2-b35b6c93de57] Received merkle tree for songs from /172.31.44.75
      INFO  [AntiEntropyStage:1] 2016-06-06 23:19:54,460  RepairSession.java:181 - [repair #2d177cc0-2c3d-11e6-94d2-b35b6c93de57] Received merkle tree for songs from /172.31.36.148
      INFO  [RepairJobTask:1] 2016-06-06 23:19:54,466  SyncTask.java:73 - [repair #2d177cc0-2c3d-11e6-94d2-b35b6c93de57] Endpoints /172.31.40.215 and /172.31.36.148 have 3 range(s) out of sync for songs
      INFO  [RepairJobTask:1] 2016-06-06 23:19:54,467  RemoteSyncTask.java:54 - [repair #2d177cc0-2c3d-11e6-94d2-b35b6c93de57] Forwarding streaming repair of 3 ranges to /172.31.40.215 (to be streamed with /172.31.36.148)
      INFO  [RepairJobTask:1] 2016-06-06 23:19:54,472  SyncTask.java:66 - [repair #2d177cc0-2c3d-11e6-94d2-b35b6c93de57] Endpoints /172.31.36.148 and /172.31.44.75 are consistent for songs
      INFO  [RepairJobTask:3] 2016-06-06 23:19:54,474  SyncTask.java:73 - [repair #2d177cc0-2c3d-11e6-94d2-b35b6c93de57] Endpoints /172.31.40.215 and /172.31.44.75 have 3 range(s) out of sync for songs
      INFO  [RepairJobTask:3] 2016-06-06 23:19:54,529  LocalSyncTask.java:68 - [repair #2d177cc0-2c3d-11e6-94d2-b35b6c93de57] Performing streaming repair of 3 ranges with /172.31.40.215
      INFO  [RepairJobTask:3] 2016-06-06 23:19:54,574  StreamResultFuture.java:86 - [Stream #2d423640-2c3d-11e6-94d2-b35b6c93de57] Executing streaming plan for Repair
      INFO  [StreamConnectionEstablisher:1] 2016-06-06 23:19:54,576  StreamSession.java:238 - [Stream #2d423640-2c3d-11e6-94d2-b35b6c93de57] Starting streaming to /172.31.40.215
      INFO  [StreamConnectionEstablisher:1] 2016-06-06 23:19:54,580  StreamCoordinator.java:213 - [Stream #2d423640-2c3d-11e6-94d2-b35b6c93de57, ID#0] Beginning stream session with /172.31.40.215
      INFO  [STREAM-IN-/172.31.40.215] 2016-06-06 23:19:54,588  StreamResultFuture.java:168 - [Stream #2d423640-2c3d-11e6-94d2-b35b6c93de57 ID#0] Prepare completed. Receiving 0 files(0 bytes), sending 1 files(174 bytes)
      INFO  [STREAM-IN-/172.31.40.215] 2016-06-06 23:19:55,117  StreamResultFuture.java:182 - [Stream #2d423640-2c3d-11e6-94d2-b35b6c93de57] Session with /172.31.40.215 is complete
      INFO  [STREAM-IN-/172.31.40.215] 2016-06-06 23:19:55,120  StreamResultFuture.java:214 - [Stream #2d423640-2c3d-11e6-94d2-b35b6c93de57] All sessions completed
      INFO  [STREAM-IN-/172.31.40.215] 2016-06-06 23:19:55,123  LocalSyncTask.java:114 - [repair #2d177cc0-2c3d-11e6-94d2-b35b6c93de57] Sync complete using session 2d177cc0-2c3d-11e6-94d2-b35b6c93de57 between /172.31.40.215 and /172.31.44.75 on songs
      INFO  [RepairJobTask:3] 2016-06-06 23:19:55,123  RepairJob.java:143 - [repair #2d177cc0-2c3d-11e6-94d2-b35b6c93de57] songs is fully synced
      INFO  [RepairJobTask:3] 2016-06-06 23:19:55,125  RepairSession.java:279 - [repair #2d177cc0-2c3d-11e6-94d2-b35b6c93de57] Session completed successfully
      INFO  [RepairJobTask:3] 2016-06-06 23:19:55,126  RepairRunnable.java:240 - Repair session 2d177cc0-2c3d-11e6-94d2-b35b6c93de57 for range [(3074457345618258602,-9223372036854775808], (-9223372036854775808,-3074457345618258603], (-3074457345618258603,3074457345618258602]] finished
      INFO  [CompactionExecutor:991] 2016-06-06 23:19:55,131  CompactionManager.java:511 - Starting anticompaction for weitest.songs on 2/[BigTableReader(path='/mnt/ephemeral/cassandra/data/weitest/songs-b254f711134611e692c45f08f496518a/ma-2-big-Data.db'), BigTableReader(path='/mnt/ephemeral/cassandra/data/weitest/songs-b254f711134611e692c45f08f496518a/ma-1-big-Data.db')] sstables
      INFO  [CompactionExecutor:991] 2016-06-06 23:19:55,131  CompactionManager.java:540 - SSTable BigTableReader(path='/mnt/ephemeral/cassandra/data/weitest/songs-b254f711134611e692c45f08f496518a/ma-2-big-Data.db') fully contained in range (-9223372036854775808,-9223372036854775808], mutating repairedAt instead of anticompacting
      INFO  [CompactionExecutor:991] 2016-06-06 23:19:55,135  CompactionManager.java:540 - SSTable BigTableReader(path='/mnt/ephemeral/cassandra/data/weitest/songs-b254f711134611e692c45f08f496518a/ma-1-big-Data.db') fully contained in range (-9223372036854775808,-9223372036854775808], mutating repairedAt instead of anticompacting
      INFO  [CompactionExecutor:991] 2016-06-06 23:19:55,137  CompactionManager.java:578 - Completed anticompaction successfully
      INFO  [InternalResponseStage:8] 2016-06-06 23:19:55,145  RepairRunnable.java:322 - Repair command #1 finished in 0 seconds
      

      This is the log entry from the 1st node where one SSTable was missing and needed to be repaired, indeed confirming that two equivalent streaming happened from two replica nodes:

      INFO  [AntiEntropyStage:1] 2016-06-06 23:19:54,307  Validator.java:274 - [repair #2d177cc0-2c3d-11e6-94d2-b35b6c93de57] Sending completed merkle tree to /172.31.44.75 for weitest.songs
      INFO  [AntiEntropyStage:1] 2016-06-06 23:19:54,470  StreamingRepairTask.java:58 - [streaming task #2d177cc0-2c3d-11e6-94d2-b35b6c93de57] Performing streaming repair of 3 ranges with /172.31.36.148
      INFO  [AntiEntropyStage:1] 2016-06-06 23:19:54,497  StreamResultFuture.java:86 - [Stream #2d38e770-2c3d-11e6-80ed-e382fc580483] Executing streaming plan for Repair
      INFO  [StreamConnectionEstablisher:1] 2016-06-06 23:19:54,498  StreamSession.java:238 - [Stream #2d38e770-2c3d-11e6-80ed-e382fc580483] Starting streaming to /172.31.36.148
      INFO  [StreamConnectionEstablisher:1] 2016-06-06 23:19:54,512  StreamCoordinator.java:213 - [Stream #2d38e770-2c3d-11e6-80ed-e382fc580483, ID#0] Beginning stream session with /172.31.36.148
      INFO  [STREAM-IN-/172.31.36.148] 2016-06-06 23:19:54,562  StreamResultFuture.java:168 - [Stream #2d38e770-2c3d-11e6-80ed-e382fc580483 ID#0] Prepare completed. Receiving 1 files(174 bytes), sending 0 files(0 bytes)
      INFO  [STREAM-INIT-/172.31.44.75:57066] 2016-06-06 23:19:54,579  StreamResultFuture.java:111 - [Stream #2d423640-2c3d-11e6-94d2-b35b6c93de57 ID#0] Creating new streaming plan for Repair
      INFO  [STREAM-INIT-/172.31.44.75:57066] 2016-06-06 23:19:54,580  StreamResultFuture.java:118 - [Stream #2d423640-2c3d-11e6-94d2-b35b6c93de57, ID#0] Received streaming plan for Repair
      INFO  [STREAM-INIT-/172.31.44.75:47984] 2016-06-06 23:19:54,581  StreamResultFuture.java:118 - [Stream #2d423640-2c3d-11e6-94d2-b35b6c93de57, ID#0] Received streaming plan for Repair
      INFO  [STREAM-IN-/172.31.44.75] 2016-06-06 23:19:54,584  StreamResultFuture.java:168 - [Stream #2d423640-2c3d-11e6-94d2-b35b6c93de57 ID#0] Prepare completed. Receiving 1 files(174 bytes), sending 0 files(0 bytes)
      INFO  [StreamReceiveTask:1] 2016-06-06 23:19:55,034  StreamResultFuture.java:182 - [Stream #2d38e770-2c3d-11e6-80ed-e382fc580483] Session with /172.31.36.148 is complete
      INFO  [StreamReceiveTask:1] 2016-06-06 23:19:55,037  StreamResultFuture.java:214 - [Stream #2d38e770-2c3d-11e6-80ed-e382fc580483] All sessions completed
      INFO  [StreamReceiveTask:1] 2016-06-06 23:19:55,040  StreamingRepairTask.java:85 - [repair #2d177cc0-2c3d-11e6-94d2-b35b6c93de57] streaming task succeed, returning response to /172.31.44.75
      INFO  [StreamReceiveTask:2] 2016-06-06 23:19:55,114  StreamResultFuture.java:182 - [Stream #2d423640-2c3d-11e6-94d2-b35b6c93de57] Session with /172.31.44.75 is complete
      INFO  [StreamReceiveTask:2] 2016-06-06 23:19:55,115  StreamResultFuture.java:214 - [Stream #2d423640-2c3d-11e6-94d2-b35b6c93de57] All sessions completed
      INFO  [CompactionExecutor:3] 2016-06-06 23:19:55,130  CompactionManager.java:511 - Starting anticompaction for weitest.songs on 1/[BigTableReader(path='/mnt/ephemeral/cassandra/data/weitest/songs-b254f711134611e692c45f08f496518a/ma-4-big-Data.db'), BigTableReader(path='/mnt/ephemeral/cassandra/data/weitest/songs-b254f711134611e692c45f08f496518a/ma-3-big-Data.db'), BigTableReader(path='/mnt/ephemeral/cassandra/data/weitest/songs-b254f711134611e692c45f08f496518a/ma-2-big-Data.db')] sstables
      INFO  [CompactionExecutor:3] 2016-06-06 23:19:55,131  CompactionManager.java:540 - SSTable BigTableReader(path='/mnt/ephemeral/cassandra/data/weitest/songs-b254f711134611e692c45f08f496518a/ma-2-big-Data.db') fully contained in range (-9223372036854775808,-9223372036854775808], mutating repairedAt instead of anticompacting
      INFO  [CompactionExecutor:3] 2016-06-06 23:19:55,135  CompactionManager.java:578 - Completed anticompaction successfully
      

      By making the repair coordinator to be smarter to avoid duplicated streaming, it will be a welcomed improvement for environments where compaction can easily get behind by a lot of incoming small SSTables from repair streaming (LCS and now-obsolete DTCS both suffer from this symptom a lot).

      Attachments

        Issue Links

          Activity

            This is indeed still happening and we have in fact a ticket for this: CASSANDRA-3200. As I mentioned in that ticket, I looked at it a while ago and found out there was a lot of fighting to do with the current code to make that work properly, and I kind of gave up (hence the "later" current resolution). That said, it is possible and would absolutely be a improvement, and probably a non-negligible one in many cases. That said, I think repair has quite a few issues and I'm currently a bit curious of where we can go with CASSANDRA-8911, and if that couldn't just be a better way to do repair in the log run (mentioning that so that anyone that would be willing to invest lots of time changing/optimizing current repair is aware of it).

            In any case, closing this as duplicate of CASSANDRA-3200 since there has already been some discussing in that later ticket. Feel free to re-open that latter ticket if you want to bring attention to the issue (but, for what it's worth, I don't personally intent to spend time on that issue in the short time for the reasons I mention above).

            slebresne Sylvain Lebresne added a comment - This is indeed still happening and we have in fact a ticket for this: CASSANDRA-3200 . As I mentioned in that ticket, I looked at it a while ago and found out there was a lot of fighting to do with the current code to make that work properly, and I kind of gave up (hence the "later" current resolution). That said, it is possible and would absolutely be a improvement, and probably a non-negligible one in many cases. That said, I think repair has quite a few issues and I'm currently a bit curious of where we can go with CASSANDRA-8911 , and if that couldn't just be a better way to do repair in the log run (mentioning that so that anyone that would be willing to invest lots of time changing/optimizing current repair is aware of it). In any case, closing this as duplicate of CASSANDRA-3200 since there has already been some discussing in that later ticket. Feel free to re-open that latter ticket if you want to bring attention to the issue (but, for what it's worth, I don't personally intent to spend time on that issue in the short time for the reasons I mention above).

            People

              Unassigned Unassigned
              weideng Wei Deng
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: