Uploaded image for project: 'Cassandra'
  1. Cassandra
  2. CASSANDRA-11447

Flush writer deadlock in Cassandra 2.2.5

Log workAgile BoardRank to TopRank to BottomAttach filesAttach ScreenshotBulk Copy AttachmentsBulk Move AttachmentsVotersWatch issueWatchersCreate sub-taskConvert to sub-taskMoveLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Normal
    • Resolution: Duplicate
    • None
    • None
    • None
    • Normal

    Description

      When writing heavily to one of my Cassandra tables, I got a deadlock similar to CASSANDRA-9882:

      "MemtableFlushWriter:4589" #34721 daemon prio=5 os_prio=0 tid=0x0000000005fc11d0 nid=0x7664 waiting for monitor entry [0x00007fb83f0e5000]
         java.lang.Thread.State: BLOCKED (on object monitor)
              at org.apache.cassandra.db.compaction.WrappingCompactionStrategy.handleNotification(WrappingCompactionStrategy.java:266)
              - waiting to lock <0x0000000400956258> (a org.apache.cassandra.db.compaction.WrappingCompactionStrategy)
              at org.apache.cassandra.db.lifecycle.Tracker.notifyAdded(Tracker.java:400)
              at org.apache.cassandra.db.lifecycle.Tracker.replaceFlushed(Tracker.java:332)
              at org.apache.cassandra.db.compaction.AbstractCompactionStrategy.replaceFlushed(AbstractCompactionStrategy.java:235)
              at org.apache.cassandra.db.ColumnFamilyStore.replaceFlushed(ColumnFamilyStore.java:1580)
              at org.apache.cassandra.db.Memtable$FlushRunnable.runMayThrow(Memtable.java:362)
              at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
              at com.google.common.util.concurrent.MoreExecutors$SameThreadExecutorService.execute(MoreExecutors.java:297)
              at org.apache.cassandra.db.ColumnFamilyStore$Flush.run(ColumnFamilyStore.java:1139)
              at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
              at java.lang.Thread.run(Thread.java:745)
      

      The compaction strategies in this keyspace are mixed with one table using LCS and the rest using DTCS. None of the tables here save for the LCS one seem to have large SSTable counts:

      		Table: active_counters
      		SSTable count: 2
      --
      
      		Table: aggregation_job_entries
      		SSTable count: 2
      --
      
      		Table: dsp_metrics_log
      		SSTable count: 207
      --
      
      		Table: dsp_metrics_ts_5min
      		SSTable count: 3
      --
      
      		Table: dsp_metrics_ts_day
      		SSTable count: 2
      --
      
      		Table: dsp_metrics_ts_hour
      		SSTable count: 2
      

      Yet the symptoms are similar.

      The "dsp_metrics_ts_5min" table had had a major compaction shortly before all this to get rid of the 400+ SStable files before this system went into use, but they should have been eliminated.

      Have other people seen this? I am attaching a strack trace.

      Thanks!

      Attachments

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            Unassigned Unassigned Assign to me
            mwmanley Mark Manley
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment