Uploaded image for project: 'Cassandra'
  1. Cassandra
  2. CASSANDRA-13948

Reload compaction strategies when JBOD disk boundary changes

    XMLWordPrintableJSON

Details

    • Normal

    Description

      The thread dump below shows a race between an sstable replacement by the IndexSummaryRedistribution and AbstractCompactionTask.getNextBackgroundTask:

      Thread 94580: (state = BLOCKED)
       - sun.misc.Unsafe.park(boolean, long) @bci=0 (Compiled frame; information may be imprecise)
       - java.util.concurrent.locks.LockSupport.park(java.lang.Object) @bci=14, line=175 (Compiled frame)
       - java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt() @bci=1, line=836 (Compiled frame)
       - java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(java.util.concurrent.locks.AbstractQueuedSynchronizer$Node, int) @bci=67, line=870 (Compiled frame)
       - java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(int) @bci=17, line=1199 (Compiled frame)
       - java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock() @bci=5, line=943 (Compiled frame)
       - org.apache.cassandra.db.compaction.CompactionStrategyManager.handleListChangedNotification(java.lang.Iterable, java.lang.Iterable) @bci=359, line=483 (Interpreted frame)
       - org.apache.cassandra.db.compaction.CompactionStrategyManager.handleNotification(org.apache.cassandra.notifications.INotification, java.lang.Object) @bci=53, line=555 (Interpreted frame)
       - org.apache.cassandra.db.lifecycle.Tracker.notifySSTablesChanged(java.util.Collection, java.util.Collection, org.apache.cassandra.db.compaction.OperationType, java.lang.Throwable) @bci=50, line=409 (Interpreted frame)
       - org.apache.cassandra.db.lifecycle.LifecycleTransaction.doCommit(java.lang.Throwable) @bci=157, line=227 (Interpreted frame)
       - org.apache.cassandra.utils.concurrent.Transactional$AbstractTransactional.commit(java.lang.Throwable) @bci=61, line=116 (Compiled frame)
       - org.apache.cassandra.utils.concurrent.Transactional$AbstractTransactional.commit() @bci=2, line=200 (Interpreted frame)
       - org.apache.cassandra.utils.concurrent.Transactional$AbstractTransactional.finish() @bci=5, line=185 (Interpreted frame)
       - org.apache.cassandra.io.sstable.IndexSummaryRedistribution.redistributeSummaries() @bci=559, line=130 (Interpreted frame)
       - org.apache.cassandra.db.compaction.CompactionManager.runIndexSummaryRedistribution(org.apache.cassandra.io.sstable.IndexSummaryRedistribution) @bci=9, line=1420 (Interpreted frame)
       - org.apache.cassandra.io.sstable.IndexSummaryManager.redistributeSummaries(org.apache.cassandra.io.sstable.IndexSummaryRedistribution) @bci=4, line=250 (Interpreted frame)
       - org.apache.cassandra.io.sstable.IndexSummaryManager.redistributeSummaries() @bci=30, line=228 (Interpreted frame)
       - org.apache.cassandra.io.sstable.IndexSummaryManager$1.runMayThrow() @bci=4, line=125 (Interpreted frame)
       - org.apache.cassandra.utils.WrappedRunnable.run() @bci=1, line=28 (Interpreted frame)
       - org.apache.cassandra.concurrent.DebuggableScheduledThreadPoolExecutor$UncomplainingRunnable.run() @bci=4, line=118 (Compiled frame)
       - java.util.concurrent.Executors$RunnableAdapter.call() @bci=4, line=511 (Compiled frame)
       - java.util.concurrent.FutureTask.runAndReset() @bci=47, line=308 (Compiled frame)
       - java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask) @bci=1, line=180 (Compiled frame)
       - java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run() @bci=37, line=294 (Compiled frame)
       - java.util.concurrent.ThreadPoolExecutor.runWorker(java.util.concurrent.ThreadPoolExecutor$Worker) @bci=95, line=1149 (Compiled frame)
       - java.util.concurrent.ThreadPoolExecutor$Worker.run() @bci=5, line=624 (Interpreted frame)
       - org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(java.lang.Runnable) @bci=1, line=81 (Interpreted frame)
       - org.apache.cassandra.concurrent.NamedThreadFactory$$Lambda$8.run() @bci=4 (Interpreted frame)
       - java.lang.Thread.run() @bci=11, line=748 (Compiled frame)
      
      Thread 94573: (state = IN_JAVA)
       - java.util.HashMap$HashIterator.nextNode() @bci=95, line=1441 (Compiled frame; information may be imprecise)
       - java.util.HashMap$KeyIterator.next() @bci=1, line=1461 (Compiled frame)
       - org.apache.cassandra.db.lifecycle.View$3.apply(org.apache.cassandra.db.lifecycle.View) @bci=20, line=268 (Compiled frame)
       - org.apache.cassandra.db.lifecycle.View$3.apply(java.lang.Object) @bci=5, line=265 (Compiled frame)
       - org.apache.cassandra.db.lifecycle.Tracker.apply(com.google.common.base.Predicate, com.google.common.base.Function) @bci=13, line=133 (Compiled frame)
       - org.apache.cassandra.db.lifecycle.Tracker.tryModify(java.lang.Iterable, org.apache.cassandra.db.compaction.OperationType) @bci=31, line=99 (Compiled frame)
       - org.apache.cassandra.db.compaction.LeveledCompactionStrategy.getNextBackgroundTask(int) @bci=84, line=139 (Compiled frame)
       - org.apache.cassandra.db.compaction.CompactionStrategyManager.getNextBackgroundTask(int) @bci=105, line=119 (Interpreted frame)
       - org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionCandidate.run() @bci=84, line=265 (Interpreted frame)
       - java.util.concurrent.Executors$RunnableAdapter.call() @bci=4, line=511 (Compiled frame)
       - java.util.concurrent.FutureTask.run() @bci=42, line=266 (Compiled frame)
       - java.util.concurrent.ThreadPoolExecutor.runWorker(java.util.concurrent.ThreadPoolExecutor$Worker) @bci=95, line=1149 (Compiled frame)
       - java.util.concurrent.ThreadPoolExecutor$Worker.run() @bci=5, line=624 (Interpreted frame)
       - org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(java.lang.Runnable) @bci=1, line=81 (Interpreted frame)
       - org.apache.cassandra.concurrent.NamedThreadFactory$$Lambda$8.run() @bci=4 (Interpreted frame)
       - java.lang.Thread.run() @bci=11, line=748 (Compiled frame)
      

      This particular node remain in this state forever, indicating LeveledCompactionStrategyTask.getNextBackgroundTask was looping indefinitely.

      What happened is that sstable references were replaced on the tracker by the IndexSummaryRedistribution thread, so the AbstractCompactionStrategy.getNextBackgroundTask could not create the transaction with the old references, and the IndexSummaryRedistribution could not update the sstable reference in the compaction strategy because AbstractCompactionStrategy.getNextBackgroundTask was holding the CompactionStrategyManager lock.

      Attachments

        1. 3.11-13948-dtest.png
          67 kB
          Paulo Motta
        2. trunk-13948-testall.png
          43 kB
          Paulo Motta
        3. trunk-13948-dtest.png
          41 kB
          Paulo Motta
        4. 3.11-13948-testall.png
          26 kB
          Paulo Motta
        5. 13948dtest.png
          178 kB
          Paulo Motta
        6. 13948testall.png
          43 kB
          Paulo Motta
        7. dtest2.png
          193 kB
          Paulo Motta
        8. dtest13948.png
          199 kB
          Paulo Motta
        9. threaddump-cleanup.txt
          386 kB
          Loic Lambiel
        10. threaddump.txt
          403 kB
          Loic Lambiel
        11. trace.log
          7.01 MB
          Loic Lambiel
        12. debug.log
          4.09 MB
          Dan Kinder

        Issue Links

          Activity

            People

              pauloricardomg Paulo Motta
              pauloricardomg Paulo Motta
              Paulo Motta
              Marcus Eriksson
              Votes:
              1 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: