Details
-
Improvement
-
Status: Resolved
-
Normal
-
Resolution: Duplicate
-
None
-
None
Description
jbellis mentioned this as a potential improvement in his 2013 committer meeting notes (http://grokbase.com/t/cassandra/dev/132s6sh415/notes-from-committers-meeting-streaming-and-repair): "making the repair coordinator smarter to know when to avoid duplicate streaming. E.g., if replicas A and B have row X, but C does not, currently both A and B will stream to C."
I tested in C* 3.0.6 and looks like this is still happening. Basically on a 3-node cluster I inserted into a trivial table under a keyspace with RF=3 and forced two flushes on all nodes so that I have two SSTables on each node, then I shutdown the 1st node and removed one SSTable from its data directory and restarted the node. I connected cqlsh to this node and verified that with CL.ONE the data is indeed missing; I now moved onto the 2nd node running a "nodetool repair <keyspace> <table>", here are what I observed from system.log on the 2nd node (as repair coordinator):
INFO [Thread-47] 2016-06-06 23:19:54,173 RepairRunnable.java:125 - Starting repair command #1, repairing keyspace weitest with repair options (parallelism: parallel, primary range: false, incremental: true, job threads: 1, ColumnFamilies: [songs], dataCenters: [], hosts: [], # of ranges: 3) INFO [Thread-47] 2016-06-06 23:19:54,253 RepairSession.java:237 - [repair #2d177cc0-2c3d-11e6-94d2-b35b6c93de57] new session: will sync /172.31.44.75, /172.31.40.215, /172.31.36.148 on range [(3074457345618258602,-9223372036854775808], (-9223372036854775808,-3074457345618258603], (-3074457345618258603,3074457345618258602]] for weitest.[songs] INFO [Repair#1:1] 2016-06-06 23:19:54,268 RepairJob.java:172 - [repair #2d177cc0-2c3d-11e6-94d2-b35b6c93de57] Requesting merkle trees for songs (to [/172.31.40.215, /172.31.36.148, /172.31.44.75]) INFO [AntiEntropyStage:1] 2016-06-06 23:19:54,335 RepairSession.java:181 - [repair #2d177cc0-2c3d-11e6-94d2-b35b6c93de57] Received merkle tree for songs from /172.31.40.215 INFO [AntiEntropyStage:1] 2016-06-06 23:19:54,427 RepairSession.java:181 - [repair #2d177cc0-2c3d-11e6-94d2-b35b6c93de57] Received merkle tree for songs from /172.31.44.75 INFO [AntiEntropyStage:1] 2016-06-06 23:19:54,460 RepairSession.java:181 - [repair #2d177cc0-2c3d-11e6-94d2-b35b6c93de57] Received merkle tree for songs from /172.31.36.148 INFO [RepairJobTask:1] 2016-06-06 23:19:54,466 SyncTask.java:73 - [repair #2d177cc0-2c3d-11e6-94d2-b35b6c93de57] Endpoints /172.31.40.215 and /172.31.36.148 have 3 range(s) out of sync for songs INFO [RepairJobTask:1] 2016-06-06 23:19:54,467 RemoteSyncTask.java:54 - [repair #2d177cc0-2c3d-11e6-94d2-b35b6c93de57] Forwarding streaming repair of 3 ranges to /172.31.40.215 (to be streamed with /172.31.36.148) INFO [RepairJobTask:1] 2016-06-06 23:19:54,472 SyncTask.java:66 - [repair #2d177cc0-2c3d-11e6-94d2-b35b6c93de57] Endpoints /172.31.36.148 and /172.31.44.75 are consistent for songs INFO [RepairJobTask:3] 2016-06-06 23:19:54,474 SyncTask.java:73 - [repair #2d177cc0-2c3d-11e6-94d2-b35b6c93de57] Endpoints /172.31.40.215 and /172.31.44.75 have 3 range(s) out of sync for songs INFO [RepairJobTask:3] 2016-06-06 23:19:54,529 LocalSyncTask.java:68 - [repair #2d177cc0-2c3d-11e6-94d2-b35b6c93de57] Performing streaming repair of 3 ranges with /172.31.40.215 INFO [RepairJobTask:3] 2016-06-06 23:19:54,574 StreamResultFuture.java:86 - [Stream #2d423640-2c3d-11e6-94d2-b35b6c93de57] Executing streaming plan for Repair INFO [StreamConnectionEstablisher:1] 2016-06-06 23:19:54,576 StreamSession.java:238 - [Stream #2d423640-2c3d-11e6-94d2-b35b6c93de57] Starting streaming to /172.31.40.215 INFO [StreamConnectionEstablisher:1] 2016-06-06 23:19:54,580 StreamCoordinator.java:213 - [Stream #2d423640-2c3d-11e6-94d2-b35b6c93de57, ID#0] Beginning stream session with /172.31.40.215 INFO [STREAM-IN-/172.31.40.215] 2016-06-06 23:19:54,588 StreamResultFuture.java:168 - [Stream #2d423640-2c3d-11e6-94d2-b35b6c93de57 ID#0] Prepare completed. Receiving 0 files(0 bytes), sending 1 files(174 bytes) INFO [STREAM-IN-/172.31.40.215] 2016-06-06 23:19:55,117 StreamResultFuture.java:182 - [Stream #2d423640-2c3d-11e6-94d2-b35b6c93de57] Session with /172.31.40.215 is complete INFO [STREAM-IN-/172.31.40.215] 2016-06-06 23:19:55,120 StreamResultFuture.java:214 - [Stream #2d423640-2c3d-11e6-94d2-b35b6c93de57] All sessions completed INFO [STREAM-IN-/172.31.40.215] 2016-06-06 23:19:55,123 LocalSyncTask.java:114 - [repair #2d177cc0-2c3d-11e6-94d2-b35b6c93de57] Sync complete using session 2d177cc0-2c3d-11e6-94d2-b35b6c93de57 between /172.31.40.215 and /172.31.44.75 on songs INFO [RepairJobTask:3] 2016-06-06 23:19:55,123 RepairJob.java:143 - [repair #2d177cc0-2c3d-11e6-94d2-b35b6c93de57] songs is fully synced INFO [RepairJobTask:3] 2016-06-06 23:19:55,125 RepairSession.java:279 - [repair #2d177cc0-2c3d-11e6-94d2-b35b6c93de57] Session completed successfully INFO [RepairJobTask:3] 2016-06-06 23:19:55,126 RepairRunnable.java:240 - Repair session 2d177cc0-2c3d-11e6-94d2-b35b6c93de57 for range [(3074457345618258602,-9223372036854775808], (-9223372036854775808,-3074457345618258603], (-3074457345618258603,3074457345618258602]] finished INFO [CompactionExecutor:991] 2016-06-06 23:19:55,131 CompactionManager.java:511 - Starting anticompaction for weitest.songs on 2/[BigTableReader(path='/mnt/ephemeral/cassandra/data/weitest/songs-b254f711134611e692c45f08f496518a/ma-2-big-Data.db'), BigTableReader(path='/mnt/ephemeral/cassandra/data/weitest/songs-b254f711134611e692c45f08f496518a/ma-1-big-Data.db')] sstables INFO [CompactionExecutor:991] 2016-06-06 23:19:55,131 CompactionManager.java:540 - SSTable BigTableReader(path='/mnt/ephemeral/cassandra/data/weitest/songs-b254f711134611e692c45f08f496518a/ma-2-big-Data.db') fully contained in range (-9223372036854775808,-9223372036854775808], mutating repairedAt instead of anticompacting INFO [CompactionExecutor:991] 2016-06-06 23:19:55,135 CompactionManager.java:540 - SSTable BigTableReader(path='/mnt/ephemeral/cassandra/data/weitest/songs-b254f711134611e692c45f08f496518a/ma-1-big-Data.db') fully contained in range (-9223372036854775808,-9223372036854775808], mutating repairedAt instead of anticompacting INFO [CompactionExecutor:991] 2016-06-06 23:19:55,137 CompactionManager.java:578 - Completed anticompaction successfully INFO [InternalResponseStage:8] 2016-06-06 23:19:55,145 RepairRunnable.java:322 - Repair command #1 finished in 0 seconds
This is the log entry from the 1st node where one SSTable was missing and needed to be repaired, indeed confirming that two equivalent streaming happened from two replica nodes:
INFO [AntiEntropyStage:1] 2016-06-06 23:19:54,307 Validator.java:274 - [repair #2d177cc0-2c3d-11e6-94d2-b35b6c93de57] Sending completed merkle tree to /172.31.44.75 for weitest.songs INFO [AntiEntropyStage:1] 2016-06-06 23:19:54,470 StreamingRepairTask.java:58 - [streaming task #2d177cc0-2c3d-11e6-94d2-b35b6c93de57] Performing streaming repair of 3 ranges with /172.31.36.148 INFO [AntiEntropyStage:1] 2016-06-06 23:19:54,497 StreamResultFuture.java:86 - [Stream #2d38e770-2c3d-11e6-80ed-e382fc580483] Executing streaming plan for Repair INFO [StreamConnectionEstablisher:1] 2016-06-06 23:19:54,498 StreamSession.java:238 - [Stream #2d38e770-2c3d-11e6-80ed-e382fc580483] Starting streaming to /172.31.36.148 INFO [StreamConnectionEstablisher:1] 2016-06-06 23:19:54,512 StreamCoordinator.java:213 - [Stream #2d38e770-2c3d-11e6-80ed-e382fc580483, ID#0] Beginning stream session with /172.31.36.148 INFO [STREAM-IN-/172.31.36.148] 2016-06-06 23:19:54,562 StreamResultFuture.java:168 - [Stream #2d38e770-2c3d-11e6-80ed-e382fc580483 ID#0] Prepare completed. Receiving 1 files(174 bytes), sending 0 files(0 bytes) INFO [STREAM-INIT-/172.31.44.75:57066] 2016-06-06 23:19:54,579 StreamResultFuture.java:111 - [Stream #2d423640-2c3d-11e6-94d2-b35b6c93de57 ID#0] Creating new streaming plan for Repair INFO [STREAM-INIT-/172.31.44.75:57066] 2016-06-06 23:19:54,580 StreamResultFuture.java:118 - [Stream #2d423640-2c3d-11e6-94d2-b35b6c93de57, ID#0] Received streaming plan for Repair INFO [STREAM-INIT-/172.31.44.75:47984] 2016-06-06 23:19:54,581 StreamResultFuture.java:118 - [Stream #2d423640-2c3d-11e6-94d2-b35b6c93de57, ID#0] Received streaming plan for Repair INFO [STREAM-IN-/172.31.44.75] 2016-06-06 23:19:54,584 StreamResultFuture.java:168 - [Stream #2d423640-2c3d-11e6-94d2-b35b6c93de57 ID#0] Prepare completed. Receiving 1 files(174 bytes), sending 0 files(0 bytes) INFO [StreamReceiveTask:1] 2016-06-06 23:19:55,034 StreamResultFuture.java:182 - [Stream #2d38e770-2c3d-11e6-80ed-e382fc580483] Session with /172.31.36.148 is complete INFO [StreamReceiveTask:1] 2016-06-06 23:19:55,037 StreamResultFuture.java:214 - [Stream #2d38e770-2c3d-11e6-80ed-e382fc580483] All sessions completed INFO [StreamReceiveTask:1] 2016-06-06 23:19:55,040 StreamingRepairTask.java:85 - [repair #2d177cc0-2c3d-11e6-94d2-b35b6c93de57] streaming task succeed, returning response to /172.31.44.75 INFO [StreamReceiveTask:2] 2016-06-06 23:19:55,114 StreamResultFuture.java:182 - [Stream #2d423640-2c3d-11e6-94d2-b35b6c93de57] Session with /172.31.44.75 is complete INFO [StreamReceiveTask:2] 2016-06-06 23:19:55,115 StreamResultFuture.java:214 - [Stream #2d423640-2c3d-11e6-94d2-b35b6c93de57] All sessions completed INFO [CompactionExecutor:3] 2016-06-06 23:19:55,130 CompactionManager.java:511 - Starting anticompaction for weitest.songs on 1/[BigTableReader(path='/mnt/ephemeral/cassandra/data/weitest/songs-b254f711134611e692c45f08f496518a/ma-4-big-Data.db'), BigTableReader(path='/mnt/ephemeral/cassandra/data/weitest/songs-b254f711134611e692c45f08f496518a/ma-3-big-Data.db'), BigTableReader(path='/mnt/ephemeral/cassandra/data/weitest/songs-b254f711134611e692c45f08f496518a/ma-2-big-Data.db')] sstables INFO [CompactionExecutor:3] 2016-06-06 23:19:55,131 CompactionManager.java:540 - SSTable BigTableReader(path='/mnt/ephemeral/cassandra/data/weitest/songs-b254f711134611e692c45f08f496518a/ma-2-big-Data.db') fully contained in range (-9223372036854775808,-9223372036854775808], mutating repairedAt instead of anticompacting INFO [CompactionExecutor:3] 2016-06-06 23:19:55,135 CompactionManager.java:578 - Completed anticompaction successfully
By making the repair coordinator to be smarter to avoid duplicated streaming, it will be a welcomed improvement for environments where compaction can easily get behind by a lot of incoming small SSTables from repair streaming (LCS and now-obsolete DTCS both suffer from this symptom a lot).
Attachments
Issue Links
- duplicates
-
CASSANDRA-3200 Repair: compare all trees together (for a given range/cf) instead of by pair in isolation
- Resolved
This is indeed still happening and we have in fact a ticket for this:
CASSANDRA-3200. As I mentioned in that ticket, I looked at it a while ago and found out there was a lot of fighting to do with the current code to make that work properly, and I kind of gave up (hence the "later" current resolution). That said, it is possible and would absolutely be a improvement, and probably a non-negligible one in many cases. That said, I think repair has quite a few issues and I'm currently a bit curious of where we can go with CASSANDRA-8911, and if that couldn't just be a better way to do repair in the log run (mentioning that so that anyone that would be willing to invest lots of time changing/optimizing current repair is aware of it).In any case, closing this as duplicate of
CASSANDRA-3200since there has already been some discussing in that later ticket. Feel free to re-open that latter ticket if you want to bring attention to the issue (but, for what it's worth, I don't personally intent to spend time on that issue in the short time for the reasons I mention above).