Details
Description
Our test is basically running nodetool repair on specific keyspaces (such as keyspace1) and the test is also triggering nodetool compact keyspace1 standard1 in the background.
And so it looks like running major compactions & repairs lead to that issue when using LCS.
Below is an excerpt from the thread dump (the rest is attached)
"CompactionExecutor:2" #33 daemon prio=1 os_prio=4 tid=0x00007f5363e64f10 nid=0x3c4e waiting for monitor entry [0x00007f53340d8000] java.lang.Thread.State: BLOCKED (on object monitor) at org.apache.cassandra.db.compaction.CompactionStrategyManager.handleNotification(CompactionStrategyManager.java:252) - waiting to lock <0x00000006c9362c80> (a org.apache.cassandra.db.compaction.CompactionStrategyManager) at org.apache.cassandra.db.lifecycle.Tracker.notifySSTableRepairedStatusChanged(Tracker.java:434) at org.apache.cassandra.db.compaction.CompactionManager.performAnticompaction(CompactionManager.java:550) at org.apache.cassandra.db.compaction.CompactionManager$7.runMayThrow(CompactionManager.java:465) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Locked ownable synchronizers: - <0x00000006c9362ca8> (a java.util.concurrent.ThreadPoolExecutor$Worker) "CompactionExecutor:1" #32 daemon prio=1 os_prio=4 tid=0x00007f5363e618b0 nid=0x3c4d runnable [0x00007f5334119000] java.lang.Thread.State: RUNNABLE at com.google.common.collect.Iterators$7.computeNext(Iterators.java:650) at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143) at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138) at com.google.common.collect.Iterators.addAll(Iterators.java:361) at com.google.common.collect.Iterables.addAll(Iterables.java:354) at org.apache.cassandra.db.compaction.LeveledManifest.getCandidatesFor(LeveledManifest.java:589) at org.apache.cassandra.db.compaction.LeveledManifest.getCompactionCandidates(LeveledManifest.java:349) - locked <0x00000006d0f7a6a8> (a org.apache.cassandra.db.compaction.LeveledManifest) at org.apache.cassandra.db.compaction.LeveledCompactionStrategy.getNextBackgroundTask(LeveledCompactionStrategy.java:98) - locked <0x00000006d0f7a568> (a org.apache.cassandra.db.compaction.LeveledCompactionStrategy) at org.apache.cassandra.db.compaction.CompactionStrategyManager.getNextBackgroundTask(CompactionStrategyManager.java:95) - locked <0x00000006c9362c80> (a org.apache.cassandra.db.compaction.CompactionStrategyManager) at org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionCandidate.run(CompactionManager.java:257) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745)
CPU usage is at 100%
top -p 15386 top - 12:12:40 up 1:28, 1 user, load average: 1.08, 1.11, 1.16 Tasks: 1 total, 0 running, 1 sleeping, 0 stopped, 0 zombie %Cpu(s): 0.3 us, 0.0 sy, 12.4 ni, 87.2 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st KiB Mem: 16433792 total, 8947336 used, 7486456 free, 89552 buffers KiB Swap: 0 total, 0 used, 0 free. 3326796 cached Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 15386 automat+ 20 0 7891448 5.004g 290184 S 102.9 31.9 80:07.06 java
ttop shows that the compaction thread consumes all the CPU
$ java -jar sjk.jar ttop -p 15386 Monitoring threads ... 2016-03-17T12:17:13.514+0000 Process summary process cpu=126.34% application cpu=102.81% (user=102.46% sys=0.35%) other: cpu=23.53% heap allocation rate 375mb/s [000002] user= 0.00% sys= 0.00% alloc= 0b/s - Reference Handler [000003] user= 0.00% sys= 0.00% alloc= 0b/s - Finalizer [000005] user= 0.00% sys= 0.00% alloc= 0b/s - Signal Dispatcher [000012] user= 0.00% sys= 0.00% alloc= 0b/s - RMI TCP Accept-7199 [000013] user= 0.00% sys= 0.00% alloc= 0b/s - RMI TCP Accept-0 [000015] user= 0.00% sys= 0.00% alloc= 476b/s - AsyncAppender-Worker-ASYNCDEBUGLOG [000016] user= 0.00% sys= 0.05% alloc= 1070b/s - ScheduledTasks:1 [000017] user= 0.00% sys= 0.00% alloc= 33b/s - EXPIRING-MAP-REAPER:1 [000018] user= 0.00% sys= 0.02% alloc= 1932b/s - Background_Reporter:1 [000022] user= 0.00% sys= 0.00% alloc= 0b/s - MemtablePostFlush:1 [000023] user= 0.00% sys= 0.00% alloc= 0b/s - MemtableReclaimMemory:1 [000026] user= 0.00% sys= 0.00% alloc= 0b/s - SlabPoolCleaner [000027] user= 0.00% sys= 0.00% alloc= 0b/s - PERIODIC-COMMIT-LOG-SYNCER [000028] user= 0.00% sys= 0.00% alloc= 0b/s - COMMIT-LOG-ALLOCATOR [000029] user= 0.00% sys= 0.01% alloc= 7086b/s - OptionalTasks:1 [000030] user= 0.00% sys= 0.00% alloc= 0b/s - Reference-Reaper:1 [000031] user= 0.00% sys= 0.00% alloc= 0b/s - Strong-Reference-Leak-Detector:1 [000032] user=99.45% sys= 0.07% alloc= 374mb/s - CompactionExecutor:1 [000033] user= 0.00% sys= 0.00% alloc= 0b/s - CompactionExecutor:2 [000036] user= 0.00% sys= 0.00% alloc= 0b/s - NonPeriodicTasks:1 [000037] user= 0.00% sys= 0.00% alloc= 0b/s - LocalPool-Cleaner:1 [000041] user= 0.00% sys= 0.00% alloc= 0b/s - IndexSummaryManager:1 [000043] user= 0.00% sys= 0.01% alloc= 2705b/s - GossipTasks:1 [000044] user= 0.00% sys= 0.00% alloc= 0b/s - ACCEPT-/10.200.182.146 [000045] user= 0.00% sys= 0.01% alloc= 2283b/s - BatchlogTasks:1 [000055] user= 0.00% sys= 0.02% alloc= 9494b/s - GossipStage:1 [000056] user= 0.00% sys= 0.00% alloc= 0b/s - AntiEntropyStage:1 [000057] user= 0.00% sys= 0.00% alloc= 0b/s - MigrationStage:1 [000058] user= 0.00% sys= 0.00% alloc= 0b/s - MiscStage:1 [000067] user= 0.00% sys= 0.02% alloc= 2445b/s - MessagingService-Incoming-/10.200.182.144 [000068] user= 0.00% sys= 0.01% alloc= 968b/s - MessagingService-Outgoing-/10.200.182.144 [000069] user= 0.00% sys= 0.00% alloc= 0b/s - MessagingService-Outgoing-/10.200.182.144 [000070] user= 0.00% sys= 0.02% alloc= 512b/s - MessagingService-Outgoing-/10.200.182.144 [000072] user= 0.00% sys= 0.00% alloc= 0b/s - NanoTimeToCurrentTimeMillis updater [000073] user= 0.00% sys= 0.02% alloc= 3113b/s - MessagingService-Incoming-/10.200.182.144 [000074] user= 0.00% sys= 0.00% alloc= 0b/s - PendingRangeCalculator:1 [000075] user= 0.00% sys= 0.41% alloc= 66kb/s - SharedPool-Worker-1 [000076] user= 0.00% sys= 0.00% alloc= 0b/s - SharedPool-Worker-2 [000077] user= 0.00% sys= 0.00% alloc= 0b/s - SharedPool-Worker-3 [000078] user= 0.00% sys= 0.00% alloc= 0b/s - SharedPool-Worker-4 [000079] user= 0.00% sys= 0.00% alloc= 0b/s - SharedPool-Worker-5 [000080] user= 0.00% sys= 0.00% alloc= 0b/s - SharedPool-Worker-6 [000081] user= 0.00% sys= 0.00% alloc= 0b/s - SharedPool-Worker-7 [000082] user= 0.00% sys= 0.00% alloc= 0b/s - SharedPool-Worker-8 [000084] user= 0.00% sys= 0.00% alloc= 0b/s - Thread-2 [000085] user= 0.00% sys= 0.00% alloc= 181b/s - HintsWriteExecutor:1 [000091] user= 0.00% sys= 0.00% alloc= 0b/s - PO-thread-0 [000092] user= 0.00% sys= 0.00% alloc= 0b/s - NodeHealthPlugin-Scheduler-thread-0 [000093] user= 0.00% sys= 0.00% alloc= 0b/s - pool-10-thread-1 [000094] user= 0.00% sys= 0.00% alloc= 0b/s - pool-10-thread-2 [000097] user= 0.00% sys= 0.00% alloc= 0b/s - Lease RemoteMessageServer acceptor-2-1 [000104] user= 0.00% sys= 0.00% alloc= 0b/s - RemoteMessageClient worker-4-1 [000120] user= 0.00% sys= 0.00% alloc= 0b/s - RemoteMessageClient connection limiter - 0 [000121] user= 0.00% sys= 0.00% alloc= 0b/s - threadDeathWatcher-5-1 [000122] user= 0.00% sys= 0.00% alloc= 0b/s - PO-thread scheduler [000123] user= 0.00% sys= 0.00% alloc= 0b/s - JOB-TRACKER [000124] user= 0.00% sys= 0.01% alloc= 1276b/s - TASK-TRACKER [000127] user= 0.00% sys= 0.00% alloc= 0b/s - epollEventLoopGroup-6-1 [000128] user= 0.00% sys= 0.00% alloc= 0b/s - epollEventLoopGroup-6-2 [000129] user= 0.00% sys= 0.00% alloc= 0b/s - epollEventLoopGroup-6-3 [000130] user= 0.00% sys= 0.00% alloc= 0b/s - epollEventLoopGroup-6-4 [000131] user= 0.00% sys= 0.00% alloc= 0b/s - epollEventLoopGroup-6-5 [000132] user= 0.00% sys= 0.00% alloc= 0b/s - epollEventLoopGroup-6-6 [000133] user= 0.00% sys= 0.00% alloc= 0b/s - epollEventLoopGroup-6-7 [000134] user= 0.00% sys= 0.00% alloc= 0b/s - epollEventLoopGroup-6-8 [000135] user= 0.00% sys= 0.00% alloc= 0b/s - epollEventLoopGroup-6-9 [000136] user= 0.00% sys= 0.00% alloc= 0b/s - epollEventLoopGroup-6-10 [000137] user= 0.19% sys=-0.18% alloc= 0b/s - epollEventLoopGroup-6-11 [000138] user= 0.00% sys= 0.00% alloc= 0b/s - epollEventLoopGroup-6-12 [000139] user= 0.19% sys=-0.19% alloc= 0b/s - epollEventLoopGroup-6-13 [000140] user= 0.00% sys= 0.00% alloc= 0b/s - epollEventLoopGroup-6-14 [000141] user= 0.00% sys= 0.00% alloc= 0b/s - epollEventLoopGroup-6-15 [000142] user= 0.00% sys= 0.00% alloc= 0b/s - epollEventLoopGroup-6-16 [000143] user= 0.19% sys=-0.04% alloc= 13kb/s - Thread-7 [000144] user= 0.00% sys= 0.00% alloc= 0b/s - taskCleanup [000145] user= 0.00% sys= 0.00% alloc= 0b/s - DseGossipStateUpdater [000146] user= 0.00% sys= 0.00% alloc= 0b/s - DestroyJavaVM [000149] user= 0.00% sys= 0.00% alloc= 0b/s - Thread-10 [000150] user= 0.00% sys= 0.00% alloc= 0b/s - Thread-11 [000151] user= 0.00% sys= 0.00% alloc= 0b/s - Directory/File cleanup thread [000153] user= 0.00% sys= 0.00% alloc= 0b/s - pool-15-thread-1 [000190] user= 0.00% sys= 0.00% alloc= 0b/s - pool-18-thread-1 [000215] user= 0.00% sys= 0.00% alloc= 0b/s - pool-10-thread-3 [000217] user= 0.00% sys= 0.00% alloc= 0b/s - RMI Scheduler(0) [000220] user= 0.00% sys= 0.00% alloc= 0b/s - RMI TCP Connection(335)-10.200.182.146 [000222] user= 0.00% sys= 0.00% alloc= 0b/s - pool-10-thread-4 [000223] user= 0.00% sys= 0.00% alloc= 0b/s - taskCleanup [000224] user= 0.00% sys= 0.00% alloc= 0b/s - Thread-69 [000225] user= 0.00% sys= 0.00% alloc= 0b/s - Thread-70 [000227] user= 0.00% sys= 0.00% alloc= 0b/s - pool-19-thread-1 [000254] user= 0.00% sys= 0.00% alloc= 0b/s - SharedPool-Worker-9 [000255] user= 0.00% sys= 0.00% alloc= 0b/s - SharedPool-Worker-11 [000256] user= 0.00% sys= 0.00% alloc= 0b/s - SharedPool-Worker-10 [000269] user= 0.00% sys= 0.00% alloc= 0b/s - SharedPool-Worker-13 [000270] user= 0.00% sys= 0.00% alloc= 0b/s - SharedPool-Worker-12 [000272] user= 0.00% sys= 0.00% alloc= 0b/s - SharedPool-Worker-14 [000273] user= 0.00% sys= 0.00% alloc= 0b/s - SharedPool-Worker-15 [000274] user= 0.00% sys= 0.00% alloc= 0b/s - SharedPool-Worker-18 [000275] user= 0.00% sys= 0.00% alloc= 0b/s - SharedPool-Worker-19 [000276] user= 0.00% sys= 0.00% alloc= 0b/s - SharedPool-Worker-20 [000277] user= 0.00% sys= 0.00% alloc= 0b/s - SharedPool-Worker-17 [000278] user= 0.00% sys= 0.00% alloc= 0b/s - SharedPool-Worker-16 [000279] user= 0.00% sys= 0.00% alloc= 0b/s - SharedPool-Worker-22 [000280] user= 0.00% sys= 0.00% alloc= 0b/s - SharedPool-Worker-21 [000281] user= 0.00% sys= 0.00% alloc= 0b/s - SharedPool-Worker-23 [000282] user= 0.00% sys= 0.00% alloc= 0b/s - SharedPool-Worker-24 [000283] user= 0.00% sys= 0.00% alloc= 0b/s - SharedPool-Worker-25 [000284] user= 0.00% sys= 0.00% alloc= 0b/s - SharedPool-Worker-26 [000285] user= 0.00% sys= 0.00% alloc= 0b/s - SharedPool-Worker-27 [000286] user= 0.00% sys= 0.00% alloc= 0b/s - SharedPool-Worker-28 [000287] user= 0.00% sys= 0.00% alloc= 0b/s - SharedPool-Worker-29 [000288] user= 0.00% sys= 0.00% alloc= 0b/s - SharedPool-Worker-30 [000289] user= 0.00% sys= 0.00% alloc= 0b/s - SharedPool-Worker-31 [000290] user= 0.00% sys= 0.00% alloc= 0b/s - SharedPool-Worker-32 [000296] user= 0.00% sys= 0.00% alloc= 1970b/s - pool-2-thread-1 [000297] user= 0.00% sys= 0.00% alloc= 0b/s - SharedPool-Worker-33 [000298] user= 0.00% sys= 0.00% alloc= 0b/s - SharedPool-Worker-34 [000302] user= 0.00% sys= 0.01% alloc= 1576b/s - MessagingService-Incoming-/10.200.182.145 [000303] user= 0.00% sys= 0.00% alloc= 451b/s - MessagingService-Outgoing-/10.200.182.145 [000304] user= 0.00% sys= 0.00% alloc= 0b/s - MessagingService-Outgoing-/10.200.182.145 [000305] user= 0.00% sys= 0.01% alloc= 206b/s - MessagingService-Outgoing-/10.200.182.145 [000308] user= 0.00% sys= 0.00% alloc= 424b/s - MessagingService-Incoming-/10.200.182.145 [000314] user= 0.00% sys= 0.00% alloc= 0b/s - StreamingTransferTaskTimeouts:1 [000324] user= 0.00% sys= 0.00% alloc= 0b/s - MessagingService-Outgoing-/10.200.182.146 [000325] user= 0.00% sys= 0.00% alloc= 0b/s - MessagingService-Outgoing-/10.200.182.146 [000326] user= 0.00% sys= 0.00% alloc= 0b/s - MessagingService-Outgoing-/10.200.182.146 [000328] user= 0.00% sys= 0.00% alloc= 0b/s - MessagingService-Incoming-/10.200.182.146 [000329] user= 0.00% sys= 0.00% alloc= 0b/s - SharedPool-Worker-35 [000330] user= 0.00% sys= 0.00% alloc= 0b/s - SharedPool-Worker-37 [000331] user= 0.00% sys= 0.00% alloc= 0b/s - SharedPool-Worker-36 [000332] user= 0.00% sys= 0.00% alloc= 0b/s - SharedPool-Worker-39 [000333] user= 0.00% sys= 0.00% alloc= 0b/s - SharedPool-Worker-38 [000334] user= 0.00% sys= 0.00% alloc= 0b/s - SharedPool-Worker-42 [000335] user= 0.00% sys= 0.00% alloc= 0b/s - SharedPool-Worker-41 [000336] user= 0.00% sys= 0.00% alloc= 0b/s - SharedPool-Worker-40 [000337] user= 0.00% sys= 0.00% alloc= 0b/s - SharedPool-Worker-46 [000338] user= 0.00% sys= 0.00% alloc= 0b/s - SharedPool-Worker-44 [000339] user= 0.00% sys= 0.00% alloc= 0b/s - SharedPool-Worker-45 [000340] user= 0.00% sys= 0.00% alloc= 0b/s - SharedPool-Worker-43 [000341] user= 0.00% sys= 0.00% alloc= 0b/s - SharedPool-Worker-47 [000342] user= 0.00% sys= 0.00% alloc= 0b/s - SharedPool-Worker-48 [000343] user= 0.00% sys= 0.00% alloc= 0b/s - SharedPool-Worker-50 [000344] user= 0.00% sys= 0.00% alloc= 0b/s - SharedPool-Worker-49 [000375] user= 0.00% sys= 0.00% alloc= 0b/s - StreamConnectionEstablisher:1 [000376] user= 0.00% sys= 0.00% alloc= 0b/s - StreamConnectionEstablisher:2 [000406] user= 0.00% sys= 0.00% alloc= 0b/s - MessagingService-Incoming-/10.200.182.145 [000408] user= 0.00% sys= 0.00% alloc= 0b/s - MessagingService-Incoming-/10.200.182.146 [000409] user= 0.00% sys= 0.00% alloc= 0b/s - MessagingService-Incoming-/10.200.182.144 [000415] user= 0.00% sys= 0.00% alloc= 0b/s - StreamConnectionEstablisher:3 [000418] user= 0.00% sys= 0.00% alloc= 0b/s - StreamConnectionEstablisher:4 [000435] user= 0.00% sys= 0.00% alloc= 0b/s - StreamConnectionEstablisher:5 [000438] user= 0.00% sys= 0.00% alloc= 0b/s - StreamConnectionEstablisher:6 [000439] user= 0.00% sys= 0.00% alloc= 0b/s - StreamConnectionEstablisher:7 [000444] user= 0.00% sys= 0.00% alloc= 0b/s - StreamConnectionEstablisher:8 [000687] user= 0.00% sys= 0.00% alloc= 0b/s - JMX server connection timeout 687 [000688] user= 2.44% sys= 0.16% alloc= 1380kb/s - RMI TCP Connection(401)-10.200.182.146 [000694] user= 0.00% sys= 0.00% alloc= 0b/s - Attach Listener [000726] user= 0.00% sys= 0.00% alloc= 0b/s - RMI TCP Connection(400)-10.200.182.146 [000743] user=-0.00% sys=-0.16% alloc=-109800b/s - MemtableFlushWriter:112 [000745] user= 0.00% sys= 0.00% alloc= 0b/s - MemtableFlushWriter:113 [000746] user= 0.00% sys= 0.03% alloc= 4295b/s - JMX server connection timeout 746
Attachments
Attachments
Issue Links
- is duplicated by
-
CASSANDRA-11447 Flush writer deadlock in Cassandra 2.2.5
- Resolved