Uploaded image for project: 'Cassandra'
  1. Cassandra
  2. CASSANDRA-13538

Cassandra tasks permanently block after the following assertion occurs during compaction: "java.lang.AssertionError: Interval min > max "

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Normal
    • Resolution: Unresolved
    • 2.1.x
    • Local/Compaction
    • None
    • This happens on a 7 node system with 2 data centers. We're using Cassandra version 2.1.15. I upgraded to 2.1.17 and it still occurs.

    • Normal

    Description

      We noticed this problem because the commitlogs proliferate to the point that we eventually run out of disk space. nodetool tpstats shows several of the tasks backed up:

      Pool Name                    Active   Pending      Completed   Blocked  All time blocked
      MutationStage                     0         0      134335315         0                 0
      ReadStage                         0         0      643986790         0                 0
      RequestResponseStage              0         0         114298         0                 0
      ReadRepairStage                   0         0             36         0                 0
      CounterMutationStage              0         0              0         0                 0
      MiscStage                         0         0              0         0                 0
      AntiEntropySessions               1         1          79357         0                 0
      HintedHandoff                     0         0             90         0                 0
      GossipStage                       0         0        6595098         0                 0
      CacheCleanupExecutor              0         0              0         0                 0
      InternalResponseStage             0         0        1638369         0                 0
      CommitLogArchiver                 0         0              0         0                 0
      CompactionExecutor                2       175        2922542         0                 0
      ValidationExecutor                0         0        1465374         0                 0
      MigrationStage                    1        76            600         0                 0
      AntiEntropyStage                  1       923        8291098         0                 0
      PendingRangeCalculator            0         0             20         0                 0
      Sampler                           0         0              0         0                 0
      MemtableFlushWriter               0         0          53017         0                 0
      MemtablePostFlush                 1      4584        1545141         0                 0
      MemtableReclaimMemory             0         0          70639         0                 0
      Native-Transport-Requests         0         0         352559         0                 0
      

      This all starts after the following exception is raised in Cassandra:

      ERROR [MemtableFlushWriter:2437] 2017-05-15 01:53:23,380 CassandraDaemon.java:231 - Exception in thread Thread[MemtableFlushWriter:2437,5,main]
      java.lang.AssertionError: Interval min > max
      	at org.apache.cassandra.utils.IntervalTree$IntervalNode.<init>(IntervalTree.java:249) ~[apache-cassandra-2.1.15.jar:2.1.15]
      	at org.apache.cassandra.utils.IntervalTree.<init>(IntervalTree.java:72) ~[apache-cassandra-2.1.15.jar:2.1.15]
      	at org.apache.cassandra.db.DataTracker$SSTableIntervalTree.<init>(DataTracker.java:603) ~[apache-cassandra-2.1.15.jar:2.1.15]
      	at org.apache.cassandra.db.DataTracker$SSTableIntervalTree.<init>(DataTracker.java:597) ~[apache-cassandra-2.1.15.jar:2.1.15]
      	at org.apache.cassandra.db.DataTracker.buildIntervalTree(DataTracker.java:578) ~[apache-cassandra-2.1.15.jar:2.1.15]
      	at org.apache.cassandra.db.DataTracker$View.replaceFlushed(DataTracker.java:740) ~[apache-cassandra-2.1.15.jar:2.1.15]
      	at org.apache.cassandra.db.DataTracker.replaceFlushed(DataTracker.java:172) ~[apache-cassandra-2.1.15.jar:2.1.15]
      	at org.apache.cassandra.db.compaction.AbstractCompactionStrategy.replaceFlushed(AbstractCompactionStrategy.java:234) ~[apache-cassandra-2.1.15.jar:2.1.15]
      	at org.apache.cassandra.db.ColumnFamilyStore.replaceFlushed(ColumnFamilyStore.java:1521) ~[apache-cassandra-2.1.15.jar:2.1.15]
      	at org.apache.cassandra.db.Memtable$FlushRunnable.runMayThrow(Memtable.java:336) ~[apache-cassandra-2.1.15.jar:2.1.15]
      	at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) ~[apache-cassandra-2.1.15.jar:2.1.15]
      	at com.google.common.util.concurrent.MoreExecutors$SameThreadExecutorService.execute(MoreExecutors.java:297) ~[guava-16.0.jar:na]
      	at org.apache.cassandra.db.ColumnFamilyStore$Flush.run(ColumnFamilyStore.java:1127) ~[apache-cassandra-2.1.15.jar:2.1.15]
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) ~[na:1.8.0_121]
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) ~[na:1.8.0_121]
      	at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_121]
      

      This has only occurred on one of our system tester's setup but with regularity. I couldn't begin to tell you how to reproduce it. We have many systems deployed only one this one setup encounters this issue. I have included the jstack output, config file, log file, and schema. I even have a heap dump available if needed. After looking at the heap dump, the best I can tell is that the assertion failure left a lock (i.e. latch) in a locked state that then causes a backlog of pending tasks.

      I'm hoping this assertion will mean something to the Cassandra development community and perhaps fixed in a newer release.

      Attachments

        1. cassandra.yaml
          38 kB
          Andy Klages
        2. jstack.out
          142 kB
          Andy Klages
        3. schema.cql3
          107 kB
          Andy Klages
        4. system.log
          16.52 MB
          Andy Klages
        5. tpstats.out
          2 kB
          Andy Klages

        Activity

          People

            Unassigned Unassigned
            aklages Andy Klages
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated: