Uploaded image for project: 'Cassandra'
  1. Cassandra
  2. CASSANDRA-11447

Flush writer deadlock in Cassandra 2.2.5

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Normal
    • Resolution: Duplicate
    • None
    • None
    • None
    • Normal

    Description

      When writing heavily to one of my Cassandra tables, I got a deadlock similar to CASSANDRA-9882:

      "MemtableFlushWriter:4589" #34721 daemon prio=5 os_prio=0 tid=0x0000000005fc11d0 nid=0x7664 waiting for monitor entry [0x00007fb83f0e5000]
         java.lang.Thread.State: BLOCKED (on object monitor)
              at org.apache.cassandra.db.compaction.WrappingCompactionStrategy.handleNotification(WrappingCompactionStrategy.java:266)
              - waiting to lock <0x0000000400956258> (a org.apache.cassandra.db.compaction.WrappingCompactionStrategy)
              at org.apache.cassandra.db.lifecycle.Tracker.notifyAdded(Tracker.java:400)
              at org.apache.cassandra.db.lifecycle.Tracker.replaceFlushed(Tracker.java:332)
              at org.apache.cassandra.db.compaction.AbstractCompactionStrategy.replaceFlushed(AbstractCompactionStrategy.java:235)
              at org.apache.cassandra.db.ColumnFamilyStore.replaceFlushed(ColumnFamilyStore.java:1580)
              at org.apache.cassandra.db.Memtable$FlushRunnable.runMayThrow(Memtable.java:362)
              at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
              at com.google.common.util.concurrent.MoreExecutors$SameThreadExecutorService.execute(MoreExecutors.java:297)
              at org.apache.cassandra.db.ColumnFamilyStore$Flush.run(ColumnFamilyStore.java:1139)
              at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
              at java.lang.Thread.run(Thread.java:745)
      

      The compaction strategies in this keyspace are mixed with one table using LCS and the rest using DTCS. None of the tables here save for the LCS one seem to have large SSTable counts:

      		Table: active_counters
      		SSTable count: 2
      --
      
      		Table: aggregation_job_entries
      		SSTable count: 2
      --
      
      		Table: dsp_metrics_log
      		SSTable count: 207
      --
      
      		Table: dsp_metrics_ts_5min
      		SSTable count: 3
      --
      
      		Table: dsp_metrics_ts_day
      		SSTable count: 2
      --
      
      		Table: dsp_metrics_ts_hour
      		SSTable count: 2
      

      Yet the symptoms are similar.

      The "dsp_metrics_ts_5min" table had had a major compaction shortly before all this to get rid of the 400+ SStable files before this system went into use, but they should have been eliminated.

      Have other people seen this? I am attaching a strack trace.

      Thanks!

      Attachments

        1. cassandra.jstack.out
          244 kB
          Mark Manley

        Issue Links

          Activity

            People

              Unassigned Unassigned
              mwmanley Mark Manley
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: