Uploaded image for project: 'Cassandra'
  1. Cassandra
  2. CASSANDRA-11373

Cancelled compaction leading to infinite loop in compaction strategy getNextBackgroundTask

Agile BoardAttach filesAttach ScreenshotBulk Copy AttachmentsBulk Move AttachmentsVotersWatch issueWatchersCreate sub-taskConvert to sub-taskMoveLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Normal

    Description

      Our test is basically running nodetool repair on specific keyspaces (such as keyspace1) and the test is also triggering nodetool compact keyspace1 standard1 in the background.
      And so it looks like running major compactions & repairs lead to that issue when using LCS.

      Below is an excerpt from the thread dump (the rest is attached)

      "CompactionExecutor:2" #33 daemon prio=1 os_prio=4 tid=0x00007f5363e64f10 nid=0x3c4e waiting for monitor entry [0x00007f53340d8000]
         java.lang.Thread.State: BLOCKED (on object monitor)
      	at org.apache.cassandra.db.compaction.CompactionStrategyManager.handleNotification(CompactionStrategyManager.java:252)
      	- waiting to lock <0x00000006c9362c80> (a org.apache.cassandra.db.compaction.CompactionStrategyManager)
      	at org.apache.cassandra.db.lifecycle.Tracker.notifySSTableRepairedStatusChanged(Tracker.java:434)
      	at org.apache.cassandra.db.compaction.CompactionManager.performAnticompaction(CompactionManager.java:550)
      	at org.apache.cassandra.db.compaction.CompactionManager$7.runMayThrow(CompactionManager.java:465)
      	at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
      	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
      	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
      	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
      	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
      	at java.lang.Thread.run(Thread.java:745)
      
         Locked ownable synchronizers:
      	- <0x00000006c9362ca8> (a java.util.concurrent.ThreadPoolExecutor$Worker)
      
      "CompactionExecutor:1" #32 daemon prio=1 os_prio=4 tid=0x00007f5363e618b0 nid=0x3c4d runnable [0x00007f5334119000]
         java.lang.Thread.State: RUNNABLE
      	at com.google.common.collect.Iterators$7.computeNext(Iterators.java:650)
      	at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
      	at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)
      	at com.google.common.collect.Iterators.addAll(Iterators.java:361)
      	at com.google.common.collect.Iterables.addAll(Iterables.java:354)
      	at org.apache.cassandra.db.compaction.LeveledManifest.getCandidatesFor(LeveledManifest.java:589)
      	at org.apache.cassandra.db.compaction.LeveledManifest.getCompactionCandidates(LeveledManifest.java:349)
      	- locked <0x00000006d0f7a6a8> (a org.apache.cassandra.db.compaction.LeveledManifest)
      	at org.apache.cassandra.db.compaction.LeveledCompactionStrategy.getNextBackgroundTask(LeveledCompactionStrategy.java:98)
      	- locked <0x00000006d0f7a568> (a org.apache.cassandra.db.compaction.LeveledCompactionStrategy)
      	at org.apache.cassandra.db.compaction.CompactionStrategyManager.getNextBackgroundTask(CompactionStrategyManager.java:95)
      	- locked <0x00000006c9362c80> (a org.apache.cassandra.db.compaction.CompactionStrategyManager)
      	at org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionCandidate.run(CompactionManager.java:257)
      	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
      	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
      	at java.lang.Thread.run(Thread.java:745)
      

      CPU usage is at 100%

      top -p 15386
      top - 12:12:40 up  1:28,  1 user,  load average: 1.08, 1.11, 1.16
      Tasks:   1 total,   0 running,   1 sleeping,   0 stopped,   0 zombie
      %Cpu(s):  0.3 us,  0.0 sy, 12.4 ni, 87.2 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
      KiB Mem:  16433792 total,  8947336 used,  7486456 free,    89552 buffers
      KiB Swap:        0 total,        0 used,        0 free.  3326796 cached Mem
      
        PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
      15386 automat+  20   0 7891448 5.004g 290184 S 102.9 31.9  80:07.06 java
      
      

      ttop shows that the compaction thread consumes all the CPU

      $ java -jar sjk.jar ttop -p 15386
      Monitoring threads ...
      
      2016-03-17T12:17:13.514+0000 Process summary
        process cpu=126.34%
        application cpu=102.81% (user=102.46% sys=0.35%)
        other: cpu=23.53%
        heap allocation rate 375mb/s
      [000002] user= 0.00% sys= 0.00% alloc=     0b/s - Reference Handler
      [000003] user= 0.00% sys= 0.00% alloc=     0b/s - Finalizer
      [000005] user= 0.00% sys= 0.00% alloc=     0b/s - Signal Dispatcher
      [000012] user= 0.00% sys= 0.00% alloc=     0b/s - RMI TCP Accept-7199
      [000013] user= 0.00% sys= 0.00% alloc=     0b/s - RMI TCP Accept-0
      [000015] user= 0.00% sys= 0.00% alloc=   476b/s - AsyncAppender-Worker-ASYNCDEBUGLOG
      [000016] user= 0.00% sys= 0.05% alloc=  1070b/s - ScheduledTasks:1
      [000017] user= 0.00% sys= 0.00% alloc=    33b/s - EXPIRING-MAP-REAPER:1
      [000018] user= 0.00% sys= 0.02% alloc=  1932b/s - Background_Reporter:1
      [000022] user= 0.00% sys= 0.00% alloc=     0b/s - MemtablePostFlush:1
      [000023] user= 0.00% sys= 0.00% alloc=     0b/s - MemtableReclaimMemory:1
      [000026] user= 0.00% sys= 0.00% alloc=     0b/s - SlabPoolCleaner
      [000027] user= 0.00% sys= 0.00% alloc=     0b/s - PERIODIC-COMMIT-LOG-SYNCER
      [000028] user= 0.00% sys= 0.00% alloc=     0b/s - COMMIT-LOG-ALLOCATOR
      [000029] user= 0.00% sys= 0.01% alloc=  7086b/s - OptionalTasks:1
      [000030] user= 0.00% sys= 0.00% alloc=     0b/s - Reference-Reaper:1
      [000031] user= 0.00% sys= 0.00% alloc=     0b/s - Strong-Reference-Leak-Detector:1
      [000032] user=99.45% sys= 0.07% alloc=  374mb/s - CompactionExecutor:1
      [000033] user= 0.00% sys= 0.00% alloc=     0b/s - CompactionExecutor:2
      [000036] user= 0.00% sys= 0.00% alloc=     0b/s - NonPeriodicTasks:1
      [000037] user= 0.00% sys= 0.00% alloc=     0b/s - LocalPool-Cleaner:1
      [000041] user= 0.00% sys= 0.00% alloc=     0b/s - IndexSummaryManager:1
      [000043] user= 0.00% sys= 0.01% alloc=  2705b/s - GossipTasks:1
      [000044] user= 0.00% sys= 0.00% alloc=     0b/s - ACCEPT-/10.200.182.146
      [000045] user= 0.00% sys= 0.01% alloc=  2283b/s - BatchlogTasks:1
      [000055] user= 0.00% sys= 0.02% alloc=  9494b/s - GossipStage:1
      [000056] user= 0.00% sys= 0.00% alloc=     0b/s - AntiEntropyStage:1
      [000057] user= 0.00% sys= 0.00% alloc=     0b/s - MigrationStage:1
      [000058] user= 0.00% sys= 0.00% alloc=     0b/s - MiscStage:1
      [000067] user= 0.00% sys= 0.02% alloc=  2445b/s - MessagingService-Incoming-/10.200.182.144
      [000068] user= 0.00% sys= 0.01% alloc=   968b/s - MessagingService-Outgoing-/10.200.182.144
      [000069] user= 0.00% sys= 0.00% alloc=     0b/s - MessagingService-Outgoing-/10.200.182.144
      [000070] user= 0.00% sys= 0.02% alloc=   512b/s - MessagingService-Outgoing-/10.200.182.144
      [000072] user= 0.00% sys= 0.00% alloc=     0b/s - NanoTimeToCurrentTimeMillis updater
      [000073] user= 0.00% sys= 0.02% alloc=  3113b/s - MessagingService-Incoming-/10.200.182.144
      [000074] user= 0.00% sys= 0.00% alloc=     0b/s - PendingRangeCalculator:1
      [000075] user= 0.00% sys= 0.41% alloc=   66kb/s - SharedPool-Worker-1
      [000076] user= 0.00% sys= 0.00% alloc=     0b/s - SharedPool-Worker-2
      [000077] user= 0.00% sys= 0.00% alloc=     0b/s - SharedPool-Worker-3
      [000078] user= 0.00% sys= 0.00% alloc=     0b/s - SharedPool-Worker-4
      [000079] user= 0.00% sys= 0.00% alloc=     0b/s - SharedPool-Worker-5
      [000080] user= 0.00% sys= 0.00% alloc=     0b/s - SharedPool-Worker-6
      [000081] user= 0.00% sys= 0.00% alloc=     0b/s - SharedPool-Worker-7
      [000082] user= 0.00% sys= 0.00% alloc=     0b/s - SharedPool-Worker-8
      [000084] user= 0.00% sys= 0.00% alloc=     0b/s - Thread-2
      [000085] user= 0.00% sys= 0.00% alloc=   181b/s - HintsWriteExecutor:1
      [000091] user= 0.00% sys= 0.00% alloc=     0b/s - PO-thread-0
      [000092] user= 0.00% sys= 0.00% alloc=     0b/s - NodeHealthPlugin-Scheduler-thread-0
      [000093] user= 0.00% sys= 0.00% alloc=     0b/s - pool-10-thread-1
      [000094] user= 0.00% sys= 0.00% alloc=     0b/s - pool-10-thread-2
      [000097] user= 0.00% sys= 0.00% alloc=     0b/s - Lease RemoteMessageServer acceptor-2-1
      [000104] user= 0.00% sys= 0.00% alloc=     0b/s - RemoteMessageClient worker-4-1
      [000120] user= 0.00% sys= 0.00% alloc=     0b/s - RemoteMessageClient connection limiter - 0
      [000121] user= 0.00% sys= 0.00% alloc=     0b/s - threadDeathWatcher-5-1
      [000122] user= 0.00% sys= 0.00% alloc=     0b/s - PO-thread scheduler
      [000123] user= 0.00% sys= 0.00% alloc=     0b/s - JOB-TRACKER
      [000124] user= 0.00% sys= 0.01% alloc=  1276b/s - TASK-TRACKER
      [000127] user= 0.00% sys= 0.00% alloc=     0b/s - epollEventLoopGroup-6-1
      [000128] user= 0.00% sys= 0.00% alloc=     0b/s - epollEventLoopGroup-6-2
      [000129] user= 0.00% sys= 0.00% alloc=     0b/s - epollEventLoopGroup-6-3
      [000130] user= 0.00% sys= 0.00% alloc=     0b/s - epollEventLoopGroup-6-4
      [000131] user= 0.00% sys= 0.00% alloc=     0b/s - epollEventLoopGroup-6-5
      [000132] user= 0.00% sys= 0.00% alloc=     0b/s - epollEventLoopGroup-6-6
      [000133] user= 0.00% sys= 0.00% alloc=     0b/s - epollEventLoopGroup-6-7
      [000134] user= 0.00% sys= 0.00% alloc=     0b/s - epollEventLoopGroup-6-8
      [000135] user= 0.00% sys= 0.00% alloc=     0b/s - epollEventLoopGroup-6-9
      [000136] user= 0.00% sys= 0.00% alloc=     0b/s - epollEventLoopGroup-6-10
      [000137] user= 0.19% sys=-0.18% alloc=     0b/s - epollEventLoopGroup-6-11
      [000138] user= 0.00% sys= 0.00% alloc=     0b/s - epollEventLoopGroup-6-12
      [000139] user= 0.19% sys=-0.19% alloc=     0b/s - epollEventLoopGroup-6-13
      [000140] user= 0.00% sys= 0.00% alloc=     0b/s - epollEventLoopGroup-6-14
      [000141] user= 0.00% sys= 0.00% alloc=     0b/s - epollEventLoopGroup-6-15
      [000142] user= 0.00% sys= 0.00% alloc=     0b/s - epollEventLoopGroup-6-16
      [000143] user= 0.19% sys=-0.04% alloc=   13kb/s - Thread-7
      [000144] user= 0.00% sys= 0.00% alloc=     0b/s - taskCleanup
      [000145] user= 0.00% sys= 0.00% alloc=     0b/s - DseGossipStateUpdater
      [000146] user= 0.00% sys= 0.00% alloc=     0b/s - DestroyJavaVM
      [000149] user= 0.00% sys= 0.00% alloc=     0b/s - Thread-10
      [000150] user= 0.00% sys= 0.00% alloc=     0b/s - Thread-11
      [000151] user= 0.00% sys= 0.00% alloc=     0b/s - Directory/File cleanup thread
      [000153] user= 0.00% sys= 0.00% alloc=     0b/s - pool-15-thread-1
      [000190] user= 0.00% sys= 0.00% alloc=     0b/s - pool-18-thread-1
      [000215] user= 0.00% sys= 0.00% alloc=     0b/s - pool-10-thread-3
      [000217] user= 0.00% sys= 0.00% alloc=     0b/s - RMI Scheduler(0)
      [000220] user= 0.00% sys= 0.00% alloc=     0b/s - RMI TCP Connection(335)-10.200.182.146
      [000222] user= 0.00% sys= 0.00% alloc=     0b/s - pool-10-thread-4
      [000223] user= 0.00% sys= 0.00% alloc=     0b/s - taskCleanup
      [000224] user= 0.00% sys= 0.00% alloc=     0b/s - Thread-69
      [000225] user= 0.00% sys= 0.00% alloc=     0b/s - Thread-70
      [000227] user= 0.00% sys= 0.00% alloc=     0b/s - pool-19-thread-1
      [000254] user= 0.00% sys= 0.00% alloc=     0b/s - SharedPool-Worker-9
      [000255] user= 0.00% sys= 0.00% alloc=     0b/s - SharedPool-Worker-11
      [000256] user= 0.00% sys= 0.00% alloc=     0b/s - SharedPool-Worker-10
      [000269] user= 0.00% sys= 0.00% alloc=     0b/s - SharedPool-Worker-13
      [000270] user= 0.00% sys= 0.00% alloc=     0b/s - SharedPool-Worker-12
      [000272] user= 0.00% sys= 0.00% alloc=     0b/s - SharedPool-Worker-14
      [000273] user= 0.00% sys= 0.00% alloc=     0b/s - SharedPool-Worker-15
      [000274] user= 0.00% sys= 0.00% alloc=     0b/s - SharedPool-Worker-18
      [000275] user= 0.00% sys= 0.00% alloc=     0b/s - SharedPool-Worker-19
      [000276] user= 0.00% sys= 0.00% alloc=     0b/s - SharedPool-Worker-20
      [000277] user= 0.00% sys= 0.00% alloc=     0b/s - SharedPool-Worker-17
      [000278] user= 0.00% sys= 0.00% alloc=     0b/s - SharedPool-Worker-16
      [000279] user= 0.00% sys= 0.00% alloc=     0b/s - SharedPool-Worker-22
      [000280] user= 0.00% sys= 0.00% alloc=     0b/s - SharedPool-Worker-21
      [000281] user= 0.00% sys= 0.00% alloc=     0b/s - SharedPool-Worker-23
      [000282] user= 0.00% sys= 0.00% alloc=     0b/s - SharedPool-Worker-24
      [000283] user= 0.00% sys= 0.00% alloc=     0b/s - SharedPool-Worker-25
      [000284] user= 0.00% sys= 0.00% alloc=     0b/s - SharedPool-Worker-26
      [000285] user= 0.00% sys= 0.00% alloc=     0b/s - SharedPool-Worker-27
      [000286] user= 0.00% sys= 0.00% alloc=     0b/s - SharedPool-Worker-28
      [000287] user= 0.00% sys= 0.00% alloc=     0b/s - SharedPool-Worker-29
      [000288] user= 0.00% sys= 0.00% alloc=     0b/s - SharedPool-Worker-30
      [000289] user= 0.00% sys= 0.00% alloc=     0b/s - SharedPool-Worker-31
      [000290] user= 0.00% sys= 0.00% alloc=     0b/s - SharedPool-Worker-32
      [000296] user= 0.00% sys= 0.00% alloc=  1970b/s - pool-2-thread-1
      [000297] user= 0.00% sys= 0.00% alloc=     0b/s - SharedPool-Worker-33
      [000298] user= 0.00% sys= 0.00% alloc=     0b/s - SharedPool-Worker-34
      [000302] user= 0.00% sys= 0.01% alloc=  1576b/s - MessagingService-Incoming-/10.200.182.145
      [000303] user= 0.00% sys= 0.00% alloc=   451b/s - MessagingService-Outgoing-/10.200.182.145
      [000304] user= 0.00% sys= 0.00% alloc=     0b/s - MessagingService-Outgoing-/10.200.182.145
      [000305] user= 0.00% sys= 0.01% alloc=   206b/s - MessagingService-Outgoing-/10.200.182.145
      [000308] user= 0.00% sys= 0.00% alloc=   424b/s - MessagingService-Incoming-/10.200.182.145
      [000314] user= 0.00% sys= 0.00% alloc=     0b/s - StreamingTransferTaskTimeouts:1
      [000324] user= 0.00% sys= 0.00% alloc=     0b/s - MessagingService-Outgoing-/10.200.182.146
      [000325] user= 0.00% sys= 0.00% alloc=     0b/s - MessagingService-Outgoing-/10.200.182.146
      [000326] user= 0.00% sys= 0.00% alloc=     0b/s - MessagingService-Outgoing-/10.200.182.146
      [000328] user= 0.00% sys= 0.00% alloc=     0b/s - MessagingService-Incoming-/10.200.182.146
      [000329] user= 0.00% sys= 0.00% alloc=     0b/s - SharedPool-Worker-35
      [000330] user= 0.00% sys= 0.00% alloc=     0b/s - SharedPool-Worker-37
      [000331] user= 0.00% sys= 0.00% alloc=     0b/s - SharedPool-Worker-36
      [000332] user= 0.00% sys= 0.00% alloc=     0b/s - SharedPool-Worker-39
      [000333] user= 0.00% sys= 0.00% alloc=     0b/s - SharedPool-Worker-38
      [000334] user= 0.00% sys= 0.00% alloc=     0b/s - SharedPool-Worker-42
      [000335] user= 0.00% sys= 0.00% alloc=     0b/s - SharedPool-Worker-41
      [000336] user= 0.00% sys= 0.00% alloc=     0b/s - SharedPool-Worker-40
      [000337] user= 0.00% sys= 0.00% alloc=     0b/s - SharedPool-Worker-46
      [000338] user= 0.00% sys= 0.00% alloc=     0b/s - SharedPool-Worker-44
      [000339] user= 0.00% sys= 0.00% alloc=     0b/s - SharedPool-Worker-45
      [000340] user= 0.00% sys= 0.00% alloc=     0b/s - SharedPool-Worker-43
      [000341] user= 0.00% sys= 0.00% alloc=     0b/s - SharedPool-Worker-47
      [000342] user= 0.00% sys= 0.00% alloc=     0b/s - SharedPool-Worker-48
      [000343] user= 0.00% sys= 0.00% alloc=     0b/s - SharedPool-Worker-50
      [000344] user= 0.00% sys= 0.00% alloc=     0b/s - SharedPool-Worker-49
      [000375] user= 0.00% sys= 0.00% alloc=     0b/s - StreamConnectionEstablisher:1
      [000376] user= 0.00% sys= 0.00% alloc=     0b/s - StreamConnectionEstablisher:2
      [000406] user= 0.00% sys= 0.00% alloc=     0b/s - MessagingService-Incoming-/10.200.182.145
      [000408] user= 0.00% sys= 0.00% alloc=     0b/s - MessagingService-Incoming-/10.200.182.146
      [000409] user= 0.00% sys= 0.00% alloc=     0b/s - MessagingService-Incoming-/10.200.182.144
      [000415] user= 0.00% sys= 0.00% alloc=     0b/s - StreamConnectionEstablisher:3
      [000418] user= 0.00% sys= 0.00% alloc=     0b/s - StreamConnectionEstablisher:4
      [000435] user= 0.00% sys= 0.00% alloc=     0b/s - StreamConnectionEstablisher:5
      [000438] user= 0.00% sys= 0.00% alloc=     0b/s - StreamConnectionEstablisher:6
      [000439] user= 0.00% sys= 0.00% alloc=     0b/s - StreamConnectionEstablisher:7
      [000444] user= 0.00% sys= 0.00% alloc=     0b/s - StreamConnectionEstablisher:8
      [000687] user= 0.00% sys= 0.00% alloc=     0b/s - JMX server connection timeout 687
      [000688] user= 2.44% sys= 0.16% alloc= 1380kb/s - RMI TCP Connection(401)-10.200.182.146
      [000694] user= 0.00% sys= 0.00% alloc=     0b/s - Attach Listener
      [000726] user= 0.00% sys= 0.00% alloc=     0b/s - RMI TCP Connection(400)-10.200.182.146
      [000743] user=-0.00% sys=-0.16% alloc=-109800b/s - MemtableFlushWriter:112
      [000745] user= 0.00% sys= 0.00% alloc=     0b/s - MemtableFlushWriter:113
      [000746] user= 0.00% sys= 0.03% alloc=  4295b/s - JMX server connection timeout 746
      

      Attachments

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            marcuse Marcus Eriksson Assign to me
            eduard.tudenhoefner Eduard Tudenhoefner
            Marcus Eriksson
            Benedict Elliott Smith
            Votes:
            1 Vote for this issue
            Watchers:
            9 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment