Uploaded image for project: 'Cassandra'
  1. Cassandra
  2. CASSANDRA-11373

Cancelled compaction leading to infinite loop in compaction strategy getNextBackgroundTask

    XMLWordPrintableJSON

    Details

    • Severity:
      Normal
    • Since Version:

      Description

      Our test is basically running nodetool repair on specific keyspaces (such as keyspace1) and the test is also triggering nodetool compact keyspace1 standard1 in the background.
      And so it looks like running major compactions & repairs lead to that issue when using LCS.

      Below is an excerpt from the thread dump (the rest is attached)

      "CompactionExecutor:2" #33 daemon prio=1 os_prio=4 tid=0x00007f5363e64f10 nid=0x3c4e waiting for monitor entry [0x00007f53340d8000]
         java.lang.Thread.State: BLOCKED (on object monitor)
      	at org.apache.cassandra.db.compaction.CompactionStrategyManager.handleNotification(CompactionStrategyManager.java:252)
      	- waiting to lock <0x00000006c9362c80> (a org.apache.cassandra.db.compaction.CompactionStrategyManager)
      	at org.apache.cassandra.db.lifecycle.Tracker.notifySSTableRepairedStatusChanged(Tracker.java:434)
      	at org.apache.cassandra.db.compaction.CompactionManager.performAnticompaction(CompactionManager.java:550)
      	at org.apache.cassandra.db.compaction.CompactionManager$7.runMayThrow(CompactionManager.java:465)
      	at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
      	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
      	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
      	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
      	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
      	at java.lang.Thread.run(Thread.java:745)
      
         Locked ownable synchronizers:
      	- <0x00000006c9362ca8> (a java.util.concurrent.ThreadPoolExecutor$Worker)
      
      "CompactionExecutor:1" #32 daemon prio=1 os_prio=4 tid=0x00007f5363e618b0 nid=0x3c4d runnable [0x00007f5334119000]
         java.lang.Thread.State: RUNNABLE
      	at com.google.common.collect.Iterators$7.computeNext(Iterators.java:650)
      	at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
      	at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)
      	at com.google.common.collect.Iterators.addAll(Iterators.java:361)
      	at com.google.common.collect.Iterables.addAll(Iterables.java:354)
      	at org.apache.cassandra.db.compaction.LeveledManifest.getCandidatesFor(LeveledManifest.java:589)
      	at org.apache.cassandra.db.compaction.LeveledManifest.getCompactionCandidates(LeveledManifest.java:349)
      	- locked <0x00000006d0f7a6a8> (a org.apache.cassandra.db.compaction.LeveledManifest)
      	at org.apache.cassandra.db.compaction.LeveledCompactionStrategy.getNextBackgroundTask(LeveledCompactionStrategy.java:98)
      	- locked <0x00000006d0f7a568> (a org.apache.cassandra.db.compaction.LeveledCompactionStrategy)
      	at org.apache.cassandra.db.compaction.CompactionStrategyManager.getNextBackgroundTask(CompactionStrategyManager.java:95)
      	- locked <0x00000006c9362c80> (a org.apache.cassandra.db.compaction.CompactionStrategyManager)
      	at org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionCandidate.run(CompactionManager.java:257)
      	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
      	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
      	at java.lang.Thread.run(Thread.java:745)
      

      CPU usage is at 100%

      top -p 15386
      top - 12:12:40 up  1:28,  1 user,  load average: 1.08, 1.11, 1.16
      Tasks:   1 total,   0 running,   1 sleeping,   0 stopped,   0 zombie
      %Cpu(s):  0.3 us,  0.0 sy, 12.4 ni, 87.2 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
      KiB Mem:  16433792 total,  8947336 used,  7486456 free,    89552 buffers
      KiB Swap:        0 total,        0 used,        0 free.  3326796 cached Mem
      
        PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
      15386 automat+  20   0 7891448 5.004g 290184 S 102.9 31.9  80:07.06 java
      
      

      ttop shows that the compaction thread consumes all the CPU

      $ java -jar sjk.jar ttop -p 15386
      Monitoring threads ...
      
      2016-03-17T12:17:13.514+0000 Process summary
        process cpu=126.34%
        application cpu=102.81% (user=102.46% sys=0.35%)
        other: cpu=23.53%
        heap allocation rate 375mb/s
      [000002] user= 0.00% sys= 0.00% alloc=     0b/s - Reference Handler
      [000003] user= 0.00% sys= 0.00% alloc=     0b/s - Finalizer
      [000005] user= 0.00% sys= 0.00% alloc=     0b/s - Signal Dispatcher
      [000012] user= 0.00% sys= 0.00% alloc=     0b/s - RMI TCP Accept-7199
      [000013] user= 0.00% sys= 0.00% alloc=     0b/s - RMI TCP Accept-0
      [000015] user= 0.00% sys= 0.00% alloc=   476b/s - AsyncAppender-Worker-ASYNCDEBUGLOG
      [000016] user= 0.00% sys= 0.05% alloc=  1070b/s - ScheduledTasks:1
      [000017] user= 0.00% sys= 0.00% alloc=    33b/s - EXPIRING-MAP-REAPER:1
      [000018] user= 0.00% sys= 0.02% alloc=  1932b/s - Background_Reporter:1
      [000022] user= 0.00% sys= 0.00% alloc=     0b/s - MemtablePostFlush:1
      [000023] user= 0.00% sys= 0.00% alloc=     0b/s - MemtableReclaimMemory:1
      [000026] user= 0.00% sys= 0.00% alloc=     0b/s - SlabPoolCleaner
      [000027] user= 0.00% sys= 0.00% alloc=     0b/s - PERIODIC-COMMIT-LOG-SYNCER
      [000028] user= 0.00% sys= 0.00% alloc=     0b/s - COMMIT-LOG-ALLOCATOR
      [000029] user= 0.00% sys= 0.01% alloc=  7086b/s - OptionalTasks:1
      [000030] user= 0.00% sys= 0.00% alloc=     0b/s - Reference-Reaper:1
      [000031] user= 0.00% sys= 0.00% alloc=     0b/s - Strong-Reference-Leak-Detector:1
      [000032] user=99.45% sys= 0.07% alloc=  374mb/s - CompactionExecutor:1
      [000033] user= 0.00% sys= 0.00% alloc=     0b/s - CompactionExecutor:2
      [000036] user= 0.00% sys= 0.00% alloc=     0b/s - NonPeriodicTasks:1
      [000037] user= 0.00% sys= 0.00% alloc=     0b/s - LocalPool-Cleaner:1
      [000041] user= 0.00% sys= 0.00% alloc=     0b/s - IndexSummaryManager:1
      [000043] user= 0.00% sys= 0.01% alloc=  2705b/s - GossipTasks:1
      [000044] user= 0.00% sys= 0.00% alloc=     0b/s - ACCEPT-/10.200.182.146
      [000045] user= 0.00% sys= 0.01% alloc=  2283b/s - BatchlogTasks:1
      [000055] user= 0.00% sys= 0.02% alloc=  9494b/s - GossipStage:1
      [000056] user= 0.00% sys= 0.00% alloc=     0b/s - AntiEntropyStage:1
      [000057] user= 0.00% sys= 0.00% alloc=     0b/s - MigrationStage:1
      [000058] user= 0.00% sys= 0.00% alloc=     0b/s - MiscStage:1
      [000067] user= 0.00% sys= 0.02% alloc=  2445b/s - MessagingService-Incoming-/10.200.182.144
      [000068] user= 0.00% sys= 0.01% alloc=   968b/s - MessagingService-Outgoing-/10.200.182.144
      [000069] user= 0.00% sys= 0.00% alloc=     0b/s - MessagingService-Outgoing-/10.200.182.144
      [000070] user= 0.00% sys= 0.02% alloc=   512b/s - MessagingService-Outgoing-/10.200.182.144
      [000072] user= 0.00% sys= 0.00% alloc=     0b/s - NanoTimeToCurrentTimeMillis updater
      [000073] user= 0.00% sys= 0.02% alloc=  3113b/s - MessagingService-Incoming-/10.200.182.144
      [000074] user= 0.00% sys= 0.00% alloc=     0b/s - PendingRangeCalculator:1
      [000075] user= 0.00% sys= 0.41% alloc=   66kb/s - SharedPool-Worker-1
      [000076] user= 0.00% sys= 0.00% alloc=     0b/s - SharedPool-Worker-2
      [000077] user= 0.00% sys= 0.00% alloc=     0b/s - SharedPool-Worker-3
      [000078] user= 0.00% sys= 0.00% alloc=     0b/s - SharedPool-Worker-4
      [000079] user= 0.00% sys= 0.00% alloc=     0b/s - SharedPool-Worker-5
      [000080] user= 0.00% sys= 0.00% alloc=     0b/s - SharedPool-Worker-6
      [000081] user= 0.00% sys= 0.00% alloc=     0b/s - SharedPool-Worker-7
      [000082] user= 0.00% sys= 0.00% alloc=     0b/s - SharedPool-Worker-8
      [000084] user= 0.00% sys= 0.00% alloc=     0b/s - Thread-2
      [000085] user= 0.00% sys= 0.00% alloc=   181b/s - HintsWriteExecutor:1
      [000091] user= 0.00% sys= 0.00% alloc=     0b/s - PO-thread-0
      [000092] user= 0.00% sys= 0.00% alloc=     0b/s - NodeHealthPlugin-Scheduler-thread-0
      [000093] user= 0.00% sys= 0.00% alloc=     0b/s - pool-10-thread-1
      [000094] user= 0.00% sys= 0.00% alloc=     0b/s - pool-10-thread-2
      [000097] user= 0.00% sys= 0.00% alloc=     0b/s - Lease RemoteMessageServer acceptor-2-1
      [000104] user= 0.00% sys= 0.00% alloc=     0b/s - RemoteMessageClient worker-4-1
      [000120] user= 0.00% sys= 0.00% alloc=     0b/s - RemoteMessageClient connection limiter - 0
      [000121] user= 0.00% sys= 0.00% alloc=     0b/s - threadDeathWatcher-5-1
      [000122] user= 0.00% sys= 0.00% alloc=     0b/s - PO-thread scheduler
      [000123] user= 0.00% sys= 0.00% alloc=     0b/s - JOB-TRACKER
      [000124] user= 0.00% sys= 0.01% alloc=  1276b/s - TASK-TRACKER
      [000127] user= 0.00% sys= 0.00% alloc=     0b/s - epollEventLoopGroup-6-1
      [000128] user= 0.00% sys= 0.00% alloc=     0b/s - epollEventLoopGroup-6-2
      [000129] user= 0.00% sys= 0.00% alloc=     0b/s - epollEventLoopGroup-6-3
      [000130] user= 0.00% sys= 0.00% alloc=     0b/s - epollEventLoopGroup-6-4
      [000131] user= 0.00% sys= 0.00% alloc=     0b/s - epollEventLoopGroup-6-5
      [000132] user= 0.00% sys= 0.00% alloc=     0b/s - epollEventLoopGroup-6-6
      [000133] user= 0.00% sys= 0.00% alloc=     0b/s - epollEventLoopGroup-6-7
      [000134] user= 0.00% sys= 0.00% alloc=     0b/s - epollEventLoopGroup-6-8
      [000135] user= 0.00% sys= 0.00% alloc=     0b/s - epollEventLoopGroup-6-9
      [000136] user= 0.00% sys= 0.00% alloc=     0b/s - epollEventLoopGroup-6-10
      [000137] user= 0.19% sys=-0.18% alloc=     0b/s - epollEventLoopGroup-6-11
      [000138] user= 0.00% sys= 0.00% alloc=     0b/s - epollEventLoopGroup-6-12
      [000139] user= 0.19% sys=-0.19% alloc=     0b/s - epollEventLoopGroup-6-13
      [000140] user= 0.00% sys= 0.00% alloc=     0b/s - epollEventLoopGroup-6-14
      [000141] user= 0.00% sys= 0.00% alloc=     0b/s - epollEventLoopGroup-6-15
      [000142] user= 0.00% sys= 0.00% alloc=     0b/s - epollEventLoopGroup-6-16
      [000143] user= 0.19% sys=-0.04% alloc=   13kb/s - Thread-7
      [000144] user= 0.00% sys= 0.00% alloc=     0b/s - taskCleanup
      [000145] user= 0.00% sys= 0.00% alloc=     0b/s - DseGossipStateUpdater
      [000146] user= 0.00% sys= 0.00% alloc=     0b/s - DestroyJavaVM
      [000149] user= 0.00% sys= 0.00% alloc=     0b/s - Thread-10
      [000150] user= 0.00% sys= 0.00% alloc=     0b/s - Thread-11
      [000151] user= 0.00% sys= 0.00% alloc=     0b/s - Directory/File cleanup thread
      [000153] user= 0.00% sys= 0.00% alloc=     0b/s - pool-15-thread-1
      [000190] user= 0.00% sys= 0.00% alloc=     0b/s - pool-18-thread-1
      [000215] user= 0.00% sys= 0.00% alloc=     0b/s - pool-10-thread-3
      [000217] user= 0.00% sys= 0.00% alloc=     0b/s - RMI Scheduler(0)
      [000220] user= 0.00% sys= 0.00% alloc=     0b/s - RMI TCP Connection(335)-10.200.182.146
      [000222] user= 0.00% sys= 0.00% alloc=     0b/s - pool-10-thread-4
      [000223] user= 0.00% sys= 0.00% alloc=     0b/s - taskCleanup
      [000224] user= 0.00% sys= 0.00% alloc=     0b/s - Thread-69
      [000225] user= 0.00% sys= 0.00% alloc=     0b/s - Thread-70
      [000227] user= 0.00% sys= 0.00% alloc=     0b/s - pool-19-thread-1
      [000254] user= 0.00% sys= 0.00% alloc=     0b/s - SharedPool-Worker-9
      [000255] user= 0.00% sys= 0.00% alloc=     0b/s - SharedPool-Worker-11
      [000256] user= 0.00% sys= 0.00% alloc=     0b/s - SharedPool-Worker-10
      [000269] user= 0.00% sys= 0.00% alloc=     0b/s - SharedPool-Worker-13
      [000270] user= 0.00% sys= 0.00% alloc=     0b/s - SharedPool-Worker-12
      [000272] user= 0.00% sys= 0.00% alloc=     0b/s - SharedPool-Worker-14
      [000273] user= 0.00% sys= 0.00% alloc=     0b/s - SharedPool-Worker-15
      [000274] user= 0.00% sys= 0.00% alloc=     0b/s - SharedPool-Worker-18
      [000275] user= 0.00% sys= 0.00% alloc=     0b/s - SharedPool-Worker-19
      [000276] user= 0.00% sys= 0.00% alloc=     0b/s - SharedPool-Worker-20
      [000277] user= 0.00% sys= 0.00% alloc=     0b/s - SharedPool-Worker-17
      [000278] user= 0.00% sys= 0.00% alloc=     0b/s - SharedPool-Worker-16
      [000279] user= 0.00% sys= 0.00% alloc=     0b/s - SharedPool-Worker-22
      [000280] user= 0.00% sys= 0.00% alloc=     0b/s - SharedPool-Worker-21
      [000281] user= 0.00% sys= 0.00% alloc=     0b/s - SharedPool-Worker-23
      [000282] user= 0.00% sys= 0.00% alloc=     0b/s - SharedPool-Worker-24
      [000283] user= 0.00% sys= 0.00% alloc=     0b/s - SharedPool-Worker-25
      [000284] user= 0.00% sys= 0.00% alloc=     0b/s - SharedPool-Worker-26
      [000285] user= 0.00% sys= 0.00% alloc=     0b/s - SharedPool-Worker-27
      [000286] user= 0.00% sys= 0.00% alloc=     0b/s - SharedPool-Worker-28
      [000287] user= 0.00% sys= 0.00% alloc=     0b/s - SharedPool-Worker-29
      [000288] user= 0.00% sys= 0.00% alloc=     0b/s - SharedPool-Worker-30
      [000289] user= 0.00% sys= 0.00% alloc=     0b/s - SharedPool-Worker-31
      [000290] user= 0.00% sys= 0.00% alloc=     0b/s - SharedPool-Worker-32
      [000296] user= 0.00% sys= 0.00% alloc=  1970b/s - pool-2-thread-1
      [000297] user= 0.00% sys= 0.00% alloc=     0b/s - SharedPool-Worker-33
      [000298] user= 0.00% sys= 0.00% alloc=     0b/s - SharedPool-Worker-34
      [000302] user= 0.00% sys= 0.01% alloc=  1576b/s - MessagingService-Incoming-/10.200.182.145
      [000303] user= 0.00% sys= 0.00% alloc=   451b/s - MessagingService-Outgoing-/10.200.182.145
      [000304] user= 0.00% sys= 0.00% alloc=     0b/s - MessagingService-Outgoing-/10.200.182.145
      [000305] user= 0.00% sys= 0.01% alloc=   206b/s - MessagingService-Outgoing-/10.200.182.145
      [000308] user= 0.00% sys= 0.00% alloc=   424b/s - MessagingService-Incoming-/10.200.182.145
      [000314] user= 0.00% sys= 0.00% alloc=     0b/s - StreamingTransferTaskTimeouts:1
      [000324] user= 0.00% sys= 0.00% alloc=     0b/s - MessagingService-Outgoing-/10.200.182.146
      [000325] user= 0.00% sys= 0.00% alloc=     0b/s - MessagingService-Outgoing-/10.200.182.146
      [000326] user= 0.00% sys= 0.00% alloc=     0b/s - MessagingService-Outgoing-/10.200.182.146
      [000328] user= 0.00% sys= 0.00% alloc=     0b/s - MessagingService-Incoming-/10.200.182.146
      [000329] user= 0.00% sys= 0.00% alloc=     0b/s - SharedPool-Worker-35
      [000330] user= 0.00% sys= 0.00% alloc=     0b/s - SharedPool-Worker-37
      [000331] user= 0.00% sys= 0.00% alloc=     0b/s - SharedPool-Worker-36
      [000332] user= 0.00% sys= 0.00% alloc=     0b/s - SharedPool-Worker-39
      [000333] user= 0.00% sys= 0.00% alloc=     0b/s - SharedPool-Worker-38
      [000334] user= 0.00% sys= 0.00% alloc=     0b/s - SharedPool-Worker-42
      [000335] user= 0.00% sys= 0.00% alloc=     0b/s - SharedPool-Worker-41
      [000336] user= 0.00% sys= 0.00% alloc=     0b/s - SharedPool-Worker-40
      [000337] user= 0.00% sys= 0.00% alloc=     0b/s - SharedPool-Worker-46
      [000338] user= 0.00% sys= 0.00% alloc=     0b/s - SharedPool-Worker-44
      [000339] user= 0.00% sys= 0.00% alloc=     0b/s - SharedPool-Worker-45
      [000340] user= 0.00% sys= 0.00% alloc=     0b/s - SharedPool-Worker-43
      [000341] user= 0.00% sys= 0.00% alloc=     0b/s - SharedPool-Worker-47
      [000342] user= 0.00% sys= 0.00% alloc=     0b/s - SharedPool-Worker-48
      [000343] user= 0.00% sys= 0.00% alloc=     0b/s - SharedPool-Worker-50
      [000344] user= 0.00% sys= 0.00% alloc=     0b/s - SharedPool-Worker-49
      [000375] user= 0.00% sys= 0.00% alloc=     0b/s - StreamConnectionEstablisher:1
      [000376] user= 0.00% sys= 0.00% alloc=     0b/s - StreamConnectionEstablisher:2
      [000406] user= 0.00% sys= 0.00% alloc=     0b/s - MessagingService-Incoming-/10.200.182.145
      [000408] user= 0.00% sys= 0.00% alloc=     0b/s - MessagingService-Incoming-/10.200.182.146
      [000409] user= 0.00% sys= 0.00% alloc=     0b/s - MessagingService-Incoming-/10.200.182.144
      [000415] user= 0.00% sys= 0.00% alloc=     0b/s - StreamConnectionEstablisher:3
      [000418] user= 0.00% sys= 0.00% alloc=     0b/s - StreamConnectionEstablisher:4
      [000435] user= 0.00% sys= 0.00% alloc=     0b/s - StreamConnectionEstablisher:5
      [000438] user= 0.00% sys= 0.00% alloc=     0b/s - StreamConnectionEstablisher:6
      [000439] user= 0.00% sys= 0.00% alloc=     0b/s - StreamConnectionEstablisher:7
      [000444] user= 0.00% sys= 0.00% alloc=     0b/s - StreamConnectionEstablisher:8
      [000687] user= 0.00% sys= 0.00% alloc=     0b/s - JMX server connection timeout 687
      [000688] user= 2.44% sys= 0.16% alloc= 1380kb/s - RMI TCP Connection(401)-10.200.182.146
      [000694] user= 0.00% sys= 0.00% alloc=     0b/s - Attach Listener
      [000726] user= 0.00% sys= 0.00% alloc=     0b/s - RMI TCP Connection(400)-10.200.182.146
      [000743] user=-0.00% sys=-0.16% alloc=-109800b/s - MemtableFlushWriter:112
      [000745] user= 0.00% sys= 0.00% alloc=     0b/s - MemtableFlushWriter:113
      [000746] user= 0.00% sys= 0.03% alloc=  4295b/s - JMX server connection timeout 746
      

        Attachments

        1. jstack.txt
          1.22 MB
          Eduard Tudenhoefner

          Issue Links

            Activity

              People

              • Assignee:
                marcuse Marcus Eriksson
                Reporter:
                eduard.tudenhoefner Eduard Tudenhoefner
                Authors:
                Marcus Eriksson
                Reviewers:
                Benedict Elliott Smith
              • Votes:
                1 Vote for this issue
                Watchers:
                9 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: