[CASSANDRA-3592] Major Compaction Incredibly Slow - ASF JIRA

Details

Type: Bug
Status: Resolved
Priority: Normal
Resolution: Cannot Reproduce
Fix Version/s: None
Component/s: None
Labels:
- compaction
Environment:

RHEL6 - 24 core machines 24 GB mem total, 11 GB java heap
java version "1.6.0_26"
6 node cluster (4@0.8.6, 2@1.0.3)

Severity:
Normal

Description

Twice now (on different nodes), I have observed major compaction for certain column families take significantly longer on 1.0.3 in comparison to 0.8.6. For example,

On the 0.8.6 node, the post compaction log message:

CompactionManager.java (line 608) Compacted to XXX. 339,164,959,170 to 158,825,469,883 (~46% of original) bytes for 25,996 keys.  Time: 26,934,317ms.

On the 1.0.3 node, the post compaction log message:

 CompactionTask.java (line 213) Compacted to [XXX].  222,338,354,529 to 147,751,403,084 (~66% of original) bytes for 26,100 keys at 0.562045MB/s. Time: 250,703,563ms.

So... literally an order of magnitude slower on 1.0.3 in comparison to 0.8.6.

Relevant configuration settings:

compaction_throughput_mb_per_sec: 0 (why? because the compaction throttling logic as currently implemented is highly unsuitable for wide rows but thats a different issue)
in_memory_compaction_limit_in_mb: 128

Column family characteristics:

Many wide rows (~5% of rows greater than > 10MB and hundreds of rows greater than 100 MB, with many small columns).
Heavy use of expiring columns - each row represents data for a particular hour so typically all columns in the row will expire together.
The significant size shrinkage as reported by the log messages is due mainly to expired data being cleaned up (I typically trigger major compaction when 30-50% of the on disk data has expired which is about once every 3 weeks per node).
Perhaps obviously: size tiered compaction and no compression (the schema has not changed since the partial upgrade to 1.0.x)
Standard column family

Performance notes during compaction:

Nice CPU usage and load average is basically the same between 0.8.6 and 1.0.3 - ie, compaction IS running and is not getting stalled or hung up anywhere.
Compaction is IO bound on the 0.8.6 machines - the disks see heavy, constant utilization when compaction is running.
Compaction is uses virtually no IO on the 1.0.3 machines - disk utilization is virtually no different when compacting vs not compacting (but at the same time, CPU usage and load average clearly indicate that compaction IS running).

Finally, I have not had time to profile more thoroughly but jconsole always shows the following stacktrace for the active compaction thread (for the 1.0.3 machine):

Stack trace: 
 org.apache.cassandra.db.ColumnFamilyStore.removeDeletedStandard(ColumnFamilyStore.java:851)
org.apache.cassandra.db.ColumnFamilyStore.removeDeletedColumnsOnly(ColumnFamilyStore.java:835)
org.apache.cassandra.db.ColumnFamilyStore.removeDeleted(ColumnFamilyStore.java:826)
org.apache.cassandra.db.compaction.PrecompactedRow.removeDeletedAndOldShards(PrecompactedRow.java:77)
org.apache.cassandra.db.compaction.PrecompactedRow.<init>(PrecompactedRow.java:102)
org.apache.cassandra.db.compaction.CompactionController.getCompactedRow(CompactionController.java:133)
org.apache.cassandra.db.compaction.CompactionIterable$Reducer.getReduced(CompactionIterable.java:102)
org.apache.cassandra.db.compaction.CompactionIterable$Reducer.getReduced(CompactionIterable.java:87)
org.apache.cassandra.utils.MergeIterator$ManyToOne.consume(MergeIterator.java:116)
org.apache.cassandra.utils.MergeIterator$ManyToOne.computeNext(MergeIterator.java:99)
com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:140)
com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:135)
com.google.common.collect.Iterators$7.computeNext(Iterators.java:614)
com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:140)
com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:135)
org.apache.cassandra.db.compaction.CompactionTask.execute(CompactionTask.java:172)
org.apache.cassandra.db.compaction.CompactionManager$4.call(CompactionManager.java:277)
java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
java.util.concurrent.FutureTask.run(FutureTask.java:138)
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
java.lang.Thread.run(Thread.java:662)

I have not really had time to debug the issue further but given the stack trace and column family characteristics, I suspect some problem with the column family column iterator .remove() method. If the underlying ISortedColumns implementation is a ArrayBackedSortedColumns instance, then repeated .remove() calls could be incredibly inefficient.

Major Compaction Incredibly Slow

Details

Description

Attachments

Activity

People

Dates