[CASSANDRA-11447] Flush writer deadlock in Cassandra 2.2.5 - ASF JIRA

Log work

Agile Board

Rank to Top

Rank to Bottom

Attach files

Attach Screenshot

Bulk Copy Attachments

Bulk Move Attachments

Voters

Watch issue

Watchers

Create sub-task

Convert to sub-task

Move

Link

Clone

Labels

Update Comment Author

Replace String in Comment

Update Comment Visibility

Delete Comments

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Normal
Resolution: Duplicate
Fix Version/s: None
Component/s: None
Labels:
None

Severity:
Normal

Description

When writing heavily to one of my Cassandra tables, I got a deadlock similar to ~~CASSANDRA-9882~~:

"MemtableFlushWriter:4589" #34721 daemon prio=5 os_prio=0 tid=0x0000000005fc11d0 nid=0x7664 waiting for monitor entry [0x00007fb83f0e5000]
   java.lang.Thread.State: BLOCKED (on object monitor)
        at org.apache.cassandra.db.compaction.WrappingCompactionStrategy.handleNotification(WrappingCompactionStrategy.java:266)
        - waiting to lock <0x0000000400956258> (a org.apache.cassandra.db.compaction.WrappingCompactionStrategy)
        at org.apache.cassandra.db.lifecycle.Tracker.notifyAdded(Tracker.java:400)
        at org.apache.cassandra.db.lifecycle.Tracker.replaceFlushed(Tracker.java:332)
        at org.apache.cassandra.db.compaction.AbstractCompactionStrategy.replaceFlushed(AbstractCompactionStrategy.java:235)
        at org.apache.cassandra.db.ColumnFamilyStore.replaceFlushed(ColumnFamilyStore.java:1580)
        at org.apache.cassandra.db.Memtable$FlushRunnable.runMayThrow(Memtable.java:362)
        at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
        at com.google.common.util.concurrent.MoreExecutors$SameThreadExecutorService.execute(MoreExecutors.java:297)
        at org.apache.cassandra.db.ColumnFamilyStore$Flush.run(ColumnFamilyStore.java:1139)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)

The compaction strategies in this keyspace are mixed with one table using LCS and the rest using DTCS. None of the tables here save for the LCS one seem to have large SSTable counts:

		Table: active_counters
		SSTable count: 2
--

		Table: aggregation_job_entries
		SSTable count: 2
--

		Table: dsp_metrics_log
		SSTable count: 207
--

		Table: dsp_metrics_ts_5min
		SSTable count: 3
--

		Table: dsp_metrics_ts_day
		SSTable count: 2
--

		Table: dsp_metrics_ts_hour
		SSTable count: 2

Yet the symptoms are similar.

The "dsp_metrics_ts_5min" table had had a major compaction shortly before all this to get rid of the 400+ SStable files before this system went into use, but they should have been eliminated.

Have other people seen this? I am attaching a strack trace.

Thanks!

Attachments

cassandra.jstack.out
28/Mar/16 19:19
244 kB
Mark Manley

Issue Links

Add Link

duplicates

CASSANDRA-11373 Cancelled compaction leading to infinite loop in compaction strategy getNextBackgroundTask

Resolved

Delete this link

Activity

Comment

This comment will be Viewable by All Users Viewable by All Users

Cancel

People

Assignee:: Unassigned Assign to me

Reporter:: Mark Manley

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 28/Mar/16 19:19

Updated:: 16/Apr/19 09:30

Resolved:: 29/Mar/16 11:08

Agile

View on Board

Flush writer deadlock in Cassandra 2.2.5

Details

Description

Attachments

Attachments

Issue Links

Activity

People

Dates

Agile

Slack

Issue deployment