Details
-
Bug
-
Status: Open
-
Normal
-
Resolution: Unresolved
-
None
-
This happens on a 7 node system with 2 data centers. We're using Cassandra version 2.1.15. I upgraded to 2.1.17 and it still occurs.
-
Normal
Description
We noticed this problem because the commitlogs proliferate to the point that we eventually run out of disk space. nodetool tpstats shows several of the tasks backed up:
Pool Name Active Pending Completed Blocked All time blocked MutationStage 0 0 134335315 0 0 ReadStage 0 0 643986790 0 0 RequestResponseStage 0 0 114298 0 0 ReadRepairStage 0 0 36 0 0 CounterMutationStage 0 0 0 0 0 MiscStage 0 0 0 0 0 AntiEntropySessions 1 1 79357 0 0 HintedHandoff 0 0 90 0 0 GossipStage 0 0 6595098 0 0 CacheCleanupExecutor 0 0 0 0 0 InternalResponseStage 0 0 1638369 0 0 CommitLogArchiver 0 0 0 0 0 CompactionExecutor 2 175 2922542 0 0 ValidationExecutor 0 0 1465374 0 0 MigrationStage 1 76 600 0 0 AntiEntropyStage 1 923 8291098 0 0 PendingRangeCalculator 0 0 20 0 0 Sampler 0 0 0 0 0 MemtableFlushWriter 0 0 53017 0 0 MemtablePostFlush 1 4584 1545141 0 0 MemtableReclaimMemory 0 0 70639 0 0 Native-Transport-Requests 0 0 352559 0 0
This all starts after the following exception is raised in Cassandra:
ERROR [MemtableFlushWriter:2437] 2017-05-15 01:53:23,380 CassandraDaemon.java:231 - Exception in thread Thread[MemtableFlushWriter:2437,5,main] java.lang.AssertionError: Interval min > max at org.apache.cassandra.utils.IntervalTree$IntervalNode.<init>(IntervalTree.java:249) ~[apache-cassandra-2.1.15.jar:2.1.15] at org.apache.cassandra.utils.IntervalTree.<init>(IntervalTree.java:72) ~[apache-cassandra-2.1.15.jar:2.1.15] at org.apache.cassandra.db.DataTracker$SSTableIntervalTree.<init>(DataTracker.java:603) ~[apache-cassandra-2.1.15.jar:2.1.15] at org.apache.cassandra.db.DataTracker$SSTableIntervalTree.<init>(DataTracker.java:597) ~[apache-cassandra-2.1.15.jar:2.1.15] at org.apache.cassandra.db.DataTracker.buildIntervalTree(DataTracker.java:578) ~[apache-cassandra-2.1.15.jar:2.1.15] at org.apache.cassandra.db.DataTracker$View.replaceFlushed(DataTracker.java:740) ~[apache-cassandra-2.1.15.jar:2.1.15] at org.apache.cassandra.db.DataTracker.replaceFlushed(DataTracker.java:172) ~[apache-cassandra-2.1.15.jar:2.1.15] at org.apache.cassandra.db.compaction.AbstractCompactionStrategy.replaceFlushed(AbstractCompactionStrategy.java:234) ~[apache-cassandra-2.1.15.jar:2.1.15] at org.apache.cassandra.db.ColumnFamilyStore.replaceFlushed(ColumnFamilyStore.java:1521) ~[apache-cassandra-2.1.15.jar:2.1.15] at org.apache.cassandra.db.Memtable$FlushRunnable.runMayThrow(Memtable.java:336) ~[apache-cassandra-2.1.15.jar:2.1.15] at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) ~[apache-cassandra-2.1.15.jar:2.1.15] at com.google.common.util.concurrent.MoreExecutors$SameThreadExecutorService.execute(MoreExecutors.java:297) ~[guava-16.0.jar:na] at org.apache.cassandra.db.ColumnFamilyStore$Flush.run(ColumnFamilyStore.java:1127) ~[apache-cassandra-2.1.15.jar:2.1.15] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) ~[na:1.8.0_121] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) ~[na:1.8.0_121] at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_121]
This has only occurred on one of our system tester's setup but with regularity. I couldn't begin to tell you how to reproduce it. We have many systems deployed only one this one setup encounters this issue. I have included the jstack output, config file, log file, and schema. I even have a heap dump available if needed. After looking at the heap dump, the best I can tell is that the assertion failure left a lock (i.e. latch) in a locked state that then causes a backlog of pending tasks.
I'm hoping this assertion will mean something to the Cassandra development community and perhaps fixed in a newer release.