Uploaded image for project: 'Cassandra'
  1. Cassandra
  2. CASSANDRA-13948

Reload compaction strategies when JBOD disk boundary changes

    XMLWordPrintableJSON

    Details

    • Severity:
      Normal

      Description

      The thread dump below shows a race between an sstable replacement by the IndexSummaryRedistribution and AbstractCompactionTask.getNextBackgroundTask:

      Thread 94580: (state = BLOCKED)
       - sun.misc.Unsafe.park(boolean, long) @bci=0 (Compiled frame; information may be imprecise)
       - java.util.concurrent.locks.LockSupport.park(java.lang.Object) @bci=14, line=175 (Compiled frame)
       - java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt() @bci=1, line=836 (Compiled frame)
       - java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(java.util.concurrent.locks.AbstractQueuedSynchronizer$Node, int) @bci=67, line=870 (Compiled frame)
       - java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(int) @bci=17, line=1199 (Compiled frame)
       - java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock() @bci=5, line=943 (Compiled frame)
       - org.apache.cassandra.db.compaction.CompactionStrategyManager.handleListChangedNotification(java.lang.Iterable, java.lang.Iterable) @bci=359, line=483 (Interpreted frame)
       - org.apache.cassandra.db.compaction.CompactionStrategyManager.handleNotification(org.apache.cassandra.notifications.INotification, java.lang.Object) @bci=53, line=555 (Interpreted frame)
       - org.apache.cassandra.db.lifecycle.Tracker.notifySSTablesChanged(java.util.Collection, java.util.Collection, org.apache.cassandra.db.compaction.OperationType, java.lang.Throwable) @bci=50, line=409 (Interpreted frame)
       - org.apache.cassandra.db.lifecycle.LifecycleTransaction.doCommit(java.lang.Throwable) @bci=157, line=227 (Interpreted frame)
       - org.apache.cassandra.utils.concurrent.Transactional$AbstractTransactional.commit(java.lang.Throwable) @bci=61, line=116 (Compiled frame)
       - org.apache.cassandra.utils.concurrent.Transactional$AbstractTransactional.commit() @bci=2, line=200 (Interpreted frame)
       - org.apache.cassandra.utils.concurrent.Transactional$AbstractTransactional.finish() @bci=5, line=185 (Interpreted frame)
       - org.apache.cassandra.io.sstable.IndexSummaryRedistribution.redistributeSummaries() @bci=559, line=130 (Interpreted frame)
       - org.apache.cassandra.db.compaction.CompactionManager.runIndexSummaryRedistribution(org.apache.cassandra.io.sstable.IndexSummaryRedistribution) @bci=9, line=1420 (Interpreted frame)
       - org.apache.cassandra.io.sstable.IndexSummaryManager.redistributeSummaries(org.apache.cassandra.io.sstable.IndexSummaryRedistribution) @bci=4, line=250 (Interpreted frame)
       - org.apache.cassandra.io.sstable.IndexSummaryManager.redistributeSummaries() @bci=30, line=228 (Interpreted frame)
       - org.apache.cassandra.io.sstable.IndexSummaryManager$1.runMayThrow() @bci=4, line=125 (Interpreted frame)
       - org.apache.cassandra.utils.WrappedRunnable.run() @bci=1, line=28 (Interpreted frame)
       - org.apache.cassandra.concurrent.DebuggableScheduledThreadPoolExecutor$UncomplainingRunnable.run() @bci=4, line=118 (Compiled frame)
       - java.util.concurrent.Executors$RunnableAdapter.call() @bci=4, line=511 (Compiled frame)
       - java.util.concurrent.FutureTask.runAndReset() @bci=47, line=308 (Compiled frame)
       - java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask) @bci=1, line=180 (Compiled frame)
       - java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run() @bci=37, line=294 (Compiled frame)
       - java.util.concurrent.ThreadPoolExecutor.runWorker(java.util.concurrent.ThreadPoolExecutor$Worker) @bci=95, line=1149 (Compiled frame)
       - java.util.concurrent.ThreadPoolExecutor$Worker.run() @bci=5, line=624 (Interpreted frame)
       - org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(java.lang.Runnable) @bci=1, line=81 (Interpreted frame)
       - org.apache.cassandra.concurrent.NamedThreadFactory$$Lambda$8.run() @bci=4 (Interpreted frame)
       - java.lang.Thread.run() @bci=11, line=748 (Compiled frame)
      
      Thread 94573: (state = IN_JAVA)
       - java.util.HashMap$HashIterator.nextNode() @bci=95, line=1441 (Compiled frame; information may be imprecise)
       - java.util.HashMap$KeyIterator.next() @bci=1, line=1461 (Compiled frame)
       - org.apache.cassandra.db.lifecycle.View$3.apply(org.apache.cassandra.db.lifecycle.View) @bci=20, line=268 (Compiled frame)
       - org.apache.cassandra.db.lifecycle.View$3.apply(java.lang.Object) @bci=5, line=265 (Compiled frame)
       - org.apache.cassandra.db.lifecycle.Tracker.apply(com.google.common.base.Predicate, com.google.common.base.Function) @bci=13, line=133 (Compiled frame)
       - org.apache.cassandra.db.lifecycle.Tracker.tryModify(java.lang.Iterable, org.apache.cassandra.db.compaction.OperationType) @bci=31, line=99 (Compiled frame)
       - org.apache.cassandra.db.compaction.LeveledCompactionStrategy.getNextBackgroundTask(int) @bci=84, line=139 (Compiled frame)
       - org.apache.cassandra.db.compaction.CompactionStrategyManager.getNextBackgroundTask(int) @bci=105, line=119 (Interpreted frame)
       - org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionCandidate.run() @bci=84, line=265 (Interpreted frame)
       - java.util.concurrent.Executors$RunnableAdapter.call() @bci=4, line=511 (Compiled frame)
       - java.util.concurrent.FutureTask.run() @bci=42, line=266 (Compiled frame)
       - java.util.concurrent.ThreadPoolExecutor.runWorker(java.util.concurrent.ThreadPoolExecutor$Worker) @bci=95, line=1149 (Compiled frame)
       - java.util.concurrent.ThreadPoolExecutor$Worker.run() @bci=5, line=624 (Interpreted frame)
       - org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(java.lang.Runnable) @bci=1, line=81 (Interpreted frame)
       - org.apache.cassandra.concurrent.NamedThreadFactory$$Lambda$8.run() @bci=4 (Interpreted frame)
       - java.lang.Thread.run() @bci=11, line=748 (Compiled frame)
      

      This particular node remain in this state forever, indicating LeveledCompactionStrategyTask.getNextBackgroundTask was looping indefinitely.

      What happened is that sstable references were replaced on the tracker by the IndexSummaryRedistribution thread, so the AbstractCompactionStrategy.getNextBackgroundTask could not create the transaction with the old references, and the IndexSummaryRedistribution could not update the sstable reference in the compaction strategy because AbstractCompactionStrategy.getNextBackgroundTask was holding the CompactionStrategyManager lock.

        Attachments

        1. 3.11-13948-dtest.png
          67 kB
          Paulo Motta (Deprecated)
        2. trunk-13948-testall.png
          43 kB
          Paulo Motta (Deprecated)
        3. trunk-13948-dtest.png
          41 kB
          Paulo Motta (Deprecated)
        4. 3.11-13948-testall.png
          26 kB
          Paulo Motta (Deprecated)
        5. 13948dtest.png
          178 kB
          Paulo Motta (Deprecated)
        6. 13948testall.png
          43 kB
          Paulo Motta (Deprecated)
        7. dtest2.png
          193 kB
          Paulo Motta (Deprecated)
        8. dtest13948.png
          199 kB
          Paulo Motta (Deprecated)
        9. threaddump-cleanup.txt
          386 kB
          Loic Lambiel
        10. threaddump.txt
          403 kB
          Loic Lambiel
        11. trace.log
          7.01 MB
          Loic Lambiel
        12. debug.log
          4.09 MB
          Dan Kinder

          Issue Links

            Activity

              People

              • Assignee:
                pauloricardomg Paulo Motta (Deprecated)
                Reporter:
                pauloricardomg Paulo Motta (Deprecated)
                Authors:
                Paulo Motta (Deprecated)
                Reviewers:
                Marcus Eriksson
              • Votes:
                1 Vote for this issue
                Watchers:
                9 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: