Uploaded image for project: 'Cassandra'
  1. Cassandra
  2. CASSANDRA-3592

Major Compaction Incredibly Slow

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Normal
    • Resolution: Cannot Reproduce
    • None
    • None
    • RHEL6 - 24 core machines 24 GB mem total, 11 GB java heap
      java version "1.6.0_26"
      6 node cluster (4@0.8.6, 2@1.0.3)

    • Normal

    Description

      Twice now (on different nodes), I have observed major compaction for certain column families take significantly longer on 1.0.3 in comparison to 0.8.6. For example,

      On the 0.8.6 node, the post compaction log message:

      CompactionManager.java (line 608) Compacted to XXX. 339,164,959,170 to 158,825,469,883 (~46% of original) bytes for 25,996 keys.  Time: 26,934,317ms.

      On the 1.0.3 node, the post compaction log message:

       CompactionTask.java (line 213) Compacted to [XXX].  222,338,354,529 to 147,751,403,084 (~66% of original) bytes for 26,100 keys at 0.562045MB/s. Time: 250,703,563ms.

      So... literally an order of magnitude slower on 1.0.3 in comparison to 0.8.6.

      Relevant configuration settings:

      • compaction_throughput_mb_per_sec: 0 (why? because the compaction throttling logic as currently implemented is highly unsuitable for wide rows but thats a different issue)
      • in_memory_compaction_limit_in_mb: 128

      Column family characteristics:

      • Many wide rows (~5% of rows greater than > 10MB and hundreds of rows greater than 100 MB, with many small columns).
      • Heavy use of expiring columns - each row represents data for a particular hour so typically all columns in the row will expire together.
      • The significant size shrinkage as reported by the log messages is due mainly to expired data being cleaned up (I typically trigger major compaction when 30-50% of the on disk data has expired which is about once every 3 weeks per node).
      • Perhaps obviously: size tiered compaction and no compression (the schema has not changed since the partial upgrade to 1.0.x)
      • Standard column family

      Performance notes during compaction:

      • Nice CPU usage and load average is basically the same between 0.8.6 and 1.0.3 - ie, compaction IS running and is not getting stalled or hung up anywhere.
      • Compaction is IO bound on the 0.8.6 machines - the disks see heavy, constant utilization when compaction is running.
      • Compaction is uses virtually no IO on the 1.0.3 machines - disk utilization is virtually no different when compacting vs not compacting (but at the same time, CPU usage and load average clearly indicate that compaction IS running).

      Finally, I have not had time to profile more thoroughly but jconsole always shows the following stacktrace for the active compaction thread (for the 1.0.3 machine):

      Stack trace: 
       org.apache.cassandra.db.ColumnFamilyStore.removeDeletedStandard(ColumnFamilyStore.java:851)
      org.apache.cassandra.db.ColumnFamilyStore.removeDeletedColumnsOnly(ColumnFamilyStore.java:835)
      org.apache.cassandra.db.ColumnFamilyStore.removeDeleted(ColumnFamilyStore.java:826)
      org.apache.cassandra.db.compaction.PrecompactedRow.removeDeletedAndOldShards(PrecompactedRow.java:77)
      org.apache.cassandra.db.compaction.PrecompactedRow.<init>(PrecompactedRow.java:102)
      org.apache.cassandra.db.compaction.CompactionController.getCompactedRow(CompactionController.java:133)
      org.apache.cassandra.db.compaction.CompactionIterable$Reducer.getReduced(CompactionIterable.java:102)
      org.apache.cassandra.db.compaction.CompactionIterable$Reducer.getReduced(CompactionIterable.java:87)
      org.apache.cassandra.utils.MergeIterator$ManyToOne.consume(MergeIterator.java:116)
      org.apache.cassandra.utils.MergeIterator$ManyToOne.computeNext(MergeIterator.java:99)
      com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:140)
      com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:135)
      com.google.common.collect.Iterators$7.computeNext(Iterators.java:614)
      com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:140)
      com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:135)
      org.apache.cassandra.db.compaction.CompactionTask.execute(CompactionTask.java:172)
      org.apache.cassandra.db.compaction.CompactionManager$4.call(CompactionManager.java:277)
      java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
      java.util.concurrent.FutureTask.run(FutureTask.java:138)
      java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
      java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
      java.lang.Thread.run(Thread.java:662)
      

      I have not really had time to debug the issue further but given the stack trace and column family characteristics, I suspect some problem with the column family column iterator .remove() method. If the underlying ISortedColumns implementation is a ArrayBackedSortedColumns instance, then repeated .remove() calls could be incredibly inefficient.

      Attachments

        Activity

          People

            Unassigned Unassigned
            dhendry Dan Hendry
            Votes:
            1 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: