[CASSANDRA-11179] Parallel cleanup can lead to disk space exhaustion - ASF JIRA

Jeff Jirsa added a comment - 17/Feb/16 23:30 - edited

Also true of scrub.

One other side-effect of parallelization worth noting is that source files are not immediately freed upon completion of each individual sstable - if you have n concurrent compactors, and 1 sstable is significantly smaller than the others, it will be finished very quickly, but there will exist a significant period of time when both the original source and resulting cleaned sstable will co-exist on disk (until all n are done?).

That is, it appears that current parallel code waits for all in-flight tasks to complete before finalizing, and because those tasks run at different speed, operators are that much more likely to run out of disk during cleanup.

Jeff Jirsa added a comment - 17/Feb/16 23:30 - edited Also true of scrub. One other side-effect of parallelization worth noting is that source files are not immediately freed upon completion of each individual sstable - if you have n concurrent compactors, and 1 sstable is significantly smaller than the others, it will be finished very quickly, but there will exist a significant period of time when both the original source and resulting cleaned sstable will co-exist on disk (until all n are done?). That is, it appears that current parallel code waits for all in-flight tasks to complete before finalizing, and because those tasks run at different speed, operators are that much more likely to run out of disk during cleanup.

T Jake Luciani added a comment - 10/Mar/16 17:03

Looks like the cleanup issue is we aren't clearing the transaction early in all cases so it's held till the end of the compaction.

branch 3.0
tests
dtest

T Jake Luciani added a comment - 10/Mar/16 17:03 Looks like the cleanup issue is we aren't clearing the transaction early in all cases so it's held till the end of the compaction. branch 3.0 tests dtest

Marcus Eriksson added a comment - 11/Mar/16 13:01

I don't think that is the problem, the rewriter should already be making that call in writer.finish() (https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/io/sstable/SSTableRewriter.java#L367-L368)

Marcus Eriksson added a comment - 11/Mar/16 13:01 I don't think that is the problem, the rewriter should already be making that call in writer.finish() ( https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/io/sstable/SSTableRewriter.java#L367-L368 )

Marcus Eriksson added a comment - 11/Mar/16 15:29

Been testing this a bit and I don't think we have any problem with cleanup not removing sstables during the operation

I ran this: https://github.com/krummas/cassandra-dtest/commits/monitor (I will convert to proper dtest)
and got this output on 2.1:

/tmp/dtest-XWN_pU/test/node1/data0/keyspace1/standard1-ea6af260e79c11e5bc1783123b779c82/keyspace1-standard1-ka-5-Data.db
/tmp/dtest-XWN_pU/test/node1/data0/keyspace1/standard1-ea6af260e79c11e5bc1783123b779c82/keyspace1-standard1-ka-2-Data.db
/tmp/dtest-XWN_pU/test/node1/data1/keyspace1/standard1-ea6af260e79c11e5bc1783123b779c82/keyspace1-standard1-ka-1-Data.db
/tmp/dtest-XWN_pU/test/node1/data2/keyspace1/standard1-ea6af260e79c11e5bc1783123b779c82/keyspace1-standard1-ka-4-Data.db
/tmp/dtest-XWN_pU/test/node1/data2/keyspace1/standard1-ea6af260e79c11e5bc1783123b779c82/keyspace1-standard1-ka-3-Data.db
----------------
/tmp/dtest-XWN_pU/test/node1/data0/keyspace1/standard1-ea6af260e79c11e5bc1783123b779c82/keyspace1-standard1-ka-6-Data.db
/tmp/dtest-XWN_pU/test/node1/data0/keyspace1/standard1-ea6af260e79c11e5bc1783123b779c82/keyspace1-standard1-ka-5-Data.db
/tmp/dtest-XWN_pU/test/node1/data1/keyspace1/standard1-ea6af260e79c11e5bc1783123b779c82/keyspace1-standard1-ka-1-Data.db
/tmp/dtest-XWN_pU/test/node1/data2/keyspace1/standard1-ea6af260e79c11e5bc1783123b779c82/keyspace1-standard1-ka-4-Data.db
/tmp/dtest-XWN_pU/test/node1/data2/keyspace1/standard1-ea6af260e79c11e5bc1783123b779c82/keyspace1-standard1-ka-3-Data.db
/tmp/dtest-XWN_pU/test/node1/data2/keyspace1/standard1-ea6af260e79c11e5bc1783123b779c82/keyspace1-standard1-tmp-ka-7-Data.db
----------------
/tmp/dtest-XWN_pU/test/node1/data0/keyspace1/standard1-ea6af260e79c11e5bc1783123b779c82/keyspace1-standard1-ka-6-Data.db
/tmp/dtest-XWN_pU/test/node1/data0/keyspace1/standard1-ea6af260e79c11e5bc1783123b779c82/keyspace1-standard1-ka-5-Data.db
/tmp/dtest-XWN_pU/test/node1/data1/keyspace1/standard1-ea6af260e79c11e5bc1783123b779c82/keyspace1-standard1-ka-1-Data.db
/tmp/dtest-XWN_pU/test/node1/data2/keyspace1/standard1-ea6af260e79c11e5bc1783123b779c82/keyspace1-standard1-ka-7-Data.db
/tmp/dtest-XWN_pU/test/node1/data2/keyspace1/standard1-ea6af260e79c11e5bc1783123b779c82/keyspace1-standard1-tmp-ka-8-Data.db
/tmp/dtest-XWN_pU/test/node1/data2/keyspace1/standard1-ea6af260e79c11e5bc1783123b779c82/keyspace1-standard1-ka-3-Data.db
----------------
/tmp/dtest-XWN_pU/test/node1/data0/keyspace1/standard1-ea6af260e79c11e5bc1783123b779c82/keyspace1-standard1-ka-6-Data.db
/tmp/dtest-XWN_pU/test/node1/data0/keyspace1/standard1-ea6af260e79c11e5bc1783123b779c82/keyspace1-standard1-ka-5-Data.db
/tmp/dtest-XWN_pU/test/node1/data0/keyspace1/standard1-ea6af260e79c11e5bc1783123b779c82/keyspace1-standard1-tmp-ka-9-Data.db
/tmp/dtest-XWN_pU/test/node1/data1/keyspace1/standard1-ea6af260e79c11e5bc1783123b779c82/keyspace1-standard1-ka-1-Data.db
/tmp/dtest-XWN_pU/test/node1/data2/keyspace1/standard1-ea6af260e79c11e5bc1783123b779c82/keyspace1-standard1-ka-8-Data.db
/tmp/dtest-XWN_pU/test/node1/data2/keyspace1/standard1-ea6af260e79c11e5bc1783123b779c82/keyspace1-standard1-ka-7-Data.db
----------------
/tmp/dtest-XWN_pU/test/node1/data0/keyspace1/standard1-ea6af260e79c11e5bc1783123b779c82/keyspace1-standard1-ka-6-Data.db
/tmp/dtest-XWN_pU/test/node1/data0/keyspace1/standard1-ea6af260e79c11e5bc1783123b779c82/keyspace1-standard1-ka-5-Data.db
/tmp/dtest-XWN_pU/test/node1/data0/keyspace1/standard1-ea6af260e79c11e5bc1783123b779c82/keyspace1-standard1-tmp-ka-9-Data.db
/tmp/dtest-XWN_pU/test/node1/data1/keyspace1/standard1-ea6af260e79c11e5bc1783123b779c82/keyspace1-standard1-ka-1-Data.db
/tmp/dtest-XWN_pU/test/node1/data2/keyspace1/standard1-ea6af260e79c11e5bc1783123b779c82/keyspace1-standard1-ka-8-Data.db
/tmp/dtest-XWN_pU/test/node1/data2/keyspace1/standard1-ea6af260e79c11e5bc1783123b779c82/keyspace1-standard1-ka-7-Data.db
----------------
/tmp/dtest-XWN_pU/test/node1/data0/keyspace1/standard1-ea6af260e79c11e5bc1783123b779c82/keyspace1-standard1-ka-6-Data.db
/tmp/dtest-XWN_pU/test/node1/data0/keyspace1/standard1-ea6af260e79c11e5bc1783123b779c82/keyspace1-standard1-ka-9-Data.db
/tmp/dtest-XWN_pU/test/node1/data0/keyspace1/standard1-ea6af260e79c11e5bc1783123b779c82/keyspace1-standard1-ka-5-Data.db
/tmp/dtest-XWN_pU/test/node1/data1/keyspace1/standard1-ea6af260e79c11e5bc1783123b779c82/keyspace1-standard1-tmp-ka-10-Data.db
/tmp/dtest-XWN_pU/test/node1/data2/keyspace1/standard1-ea6af260e79c11e5bc1783123b779c82/keyspace1-standard1-ka-8-Data.db
/tmp/dtest-XWN_pU/test/node1/data2/keyspace1/standard1-ea6af260e79c11e5bc1783123b779c82/keyspace1-standard1-ka-7-Data.db
----------------
/tmp/dtest-XWN_pU/test/node1/data0/keyspace1/standard1-ea6af260e79c11e5bc1783123b779c82/keyspace1-standard1-ka-6-Data.db
/tmp/dtest-XWN_pU/test/node1/data0/keyspace1/standard1-ea6af260e79c11e5bc1783123b779c82/keyspace1-standard1-ka-9-Data.db
/tmp/dtest-XWN_pU/test/node1/data1/keyspace1/standard1-ea6af260e79c11e5bc1783123b779c82/keyspace1-standard1-ka-10-Data.db
/tmp/dtest-XWN_pU/test/node1/data2/keyspace1/standard1-ea6af260e79c11e5bc1783123b779c82/keyspace1-standard1-ka-8-Data.db
/tmp/dtest-XWN_pU/test/node1/data2/keyspace1/standard1-ea6af260e79c11e5bc1783123b779c82/keyspace1-standard1-ka-7-Data.db
----------------

That is, only writing to a single file with a single compactor, and the old file is gone once the ~~tmp~~ file disappears.

On 3.0 I get this:

/tmp/dtest-50KYOT/test/node1/data0/keyspace1/standard1-5dfb68a0e79c11e58938d7db499c29d8/ma-2-big-Data.db
/tmp/dtest-50KYOT/test/node1/data1/keyspace1/standard1-5dfb68a0e79c11e58938d7db499c29d8/ma-5-big-Data.db
/tmp/dtest-50KYOT/test/node1/data1/keyspace1/standard1-5dfb68a0e79c11e58938d7db499c29d8/ma-3-big-Data.db
/tmp/dtest-50KYOT/test/node1/data2/keyspace1/standard1-5dfb68a0e79c11e58938d7db499c29d8/ma-4-big-Data.db
/tmp/dtest-50KYOT/test/node1/data2/keyspace1/standard1-5dfb68a0e79c11e58938d7db499c29d8/ma-1-big-Data.db
----------------
/tmp/dtest-50KYOT/test/node1/data0/keyspace1/standard1-5dfb68a0e79c11e58938d7db499c29d8/ma-2-big-Data.db
/tmp/dtest-50KYOT/test/node1/data1/keyspace1/standard1-5dfb68a0e79c11e58938d7db499c29d8/ma-5-big-Data.db
/tmp/dtest-50KYOT/test/node1/data1/keyspace1/standard1-5dfb68a0e79c11e58938d7db499c29d8/ma-3-big-Data.db
/tmp/dtest-50KYOT/test/node1/data2/keyspace1/standard1-5dfb68a0e79c11e58938d7db499c29d8/ma-6-big-Data.db
/tmp/dtest-50KYOT/test/node1/data2/keyspace1/standard1-5dfb68a0e79c11e58938d7db499c29d8/ma-4-big-Data.db
/tmp/dtest-50KYOT/test/node1/data2/keyspace1/standard1-5dfb68a0e79c11e58938d7db499c29d8/ma-1-big-Data.db
----------------
/tmp/dtest-50KYOT/test/node1/data0/keyspace1/standard1-5dfb68a0e79c11e58938d7db499c29d8/ma-8-big-Data.db
/tmp/dtest-50KYOT/test/node1/data0/keyspace1/standard1-5dfb68a0e79c11e58938d7db499c29d8/ma-2-big-Data.db
/tmp/dtest-50KYOT/test/node1/data1/keyspace1/standard1-5dfb68a0e79c11e58938d7db499c29d8/ma-3-big-Data.db
/tmp/dtest-50KYOT/test/node1/data2/keyspace1/standard1-5dfb68a0e79c11e58938d7db499c29d8/ma-6-big-Data.db
/tmp/dtest-50KYOT/test/node1/data2/keyspace1/standard1-5dfb68a0e79c11e58938d7db499c29d8/ma-7-big-Data.db
/tmp/dtest-50KYOT/test/node1/data2/keyspace1/standard1-5dfb68a0e79c11e58938d7db499c29d8/ma-1-big-Data.db
----------------
/tmp/dtest-50KYOT/test/node1/data0/keyspace1/standard1-5dfb68a0e79c11e58938d7db499c29d8/ma-8-big-Data.db
/tmp/dtest-50KYOT/test/node1/data0/keyspace1/standard1-5dfb68a0e79c11e58938d7db499c29d8/ma-2-big-Data.db
/tmp/dtest-50KYOT/test/node1/data1/keyspace1/standard1-5dfb68a0e79c11e58938d7db499c29d8/ma-9-big-Data.db
/tmp/dtest-50KYOT/test/node1/data1/keyspace1/standard1-5dfb68a0e79c11e58938d7db499c29d8/ma-3-big-Data.db
/tmp/dtest-50KYOT/test/node1/data2/keyspace1/standard1-5dfb68a0e79c11e58938d7db499c29d8/ma-6-big-Data.db
/tmp/dtest-50KYOT/test/node1/data2/keyspace1/standard1-5dfb68a0e79c11e58938d7db499c29d8/ma-7-big-Data.db
----------------
/tmp/dtest-50KYOT/test/node1/data0/keyspace1/standard1-5dfb68a0e79c11e58938d7db499c29d8/ma-8-big-Data.db
/tmp/dtest-50KYOT/test/node1/data1/keyspace1/standard1-5dfb68a0e79c11e58938d7db499c29d8/ma-9-big-Data.db
/tmp/dtest-50KYOT/test/node1/data1/keyspace1/standard1-5dfb68a0e79c11e58938d7db499c29d8/ma-3-big-Data.db
/tmp/dtest-50KYOT/test/node1/data2/keyspace1/standard1-5dfb68a0e79c11e58938d7db499c29d8/ma-6-big-Data.db
/tmp/dtest-50KYOT/test/node1/data2/keyspace1/standard1-5dfb68a0e79c11e58938d7db499c29d8/ma-10-big-Data.db
/tmp/dtest-50KYOT/test/node1/data2/keyspace1/standard1-5dfb68a0e79c11e58938d7db499c29d8/ma-7-big-Data.db
----------------
/tmp/dtest-50KYOT/test/node1/data0/keyspace1/standard1-5dfb68a0e79c11e58938d7db499c29d8/ma-8-big-Data.db
/tmp/dtest-50KYOT/test/node1/data1/keyspace1/standard1-5dfb68a0e79c11e58938d7db499c29d8/ma-9-big-Data.db
/tmp/dtest-50KYOT/test/node1/data2/keyspace1/standard1-5dfb68a0e79c11e58938d7db499c29d8/ma-6-big-Data.db
/tmp/dtest-50KYOT/test/node1/data2/keyspace1/standard1-5dfb68a0e79c11e58938d7db499c29d8/ma-10-big-Data.db
/tmp/dtest-50KYOT/test/node1/data2/keyspace1/standard1-5dfb68a0e79c11e58938d7db499c29d8/ma-7-big-Data.db
----------------

Filecount never goes above #original_files + 1 with one compactor.

So, this issue is probably down to the fact that people might have 8 concurrent compactors and then we will quickly use more diskspace.

Marcus Eriksson added a comment - 11/Mar/16 15:29 Been testing this a bit and I don't think we have any problem with cleanup not removing sstables during the operation I ran this: https://github.com/krummas/cassandra-dtest/commits/monitor (I will convert to proper dtest) and got this output on 2.1: /tmp/dtest-XWN_pU/test/node1/data0/keyspace1/standard1-ea6af260e79c11e5bc1783123b779c82/keyspace1-standard1-ka-5-Data.db /tmp/dtest-XWN_pU/test/node1/data0/keyspace1/standard1-ea6af260e79c11e5bc1783123b779c82/keyspace1-standard1-ka-2-Data.db /tmp/dtest-XWN_pU/test/node1/data1/keyspace1/standard1-ea6af260e79c11e5bc1783123b779c82/keyspace1-standard1-ka-1-Data.db /tmp/dtest-XWN_pU/test/node1/data2/keyspace1/standard1-ea6af260e79c11e5bc1783123b779c82/keyspace1-standard1-ka-4-Data.db /tmp/dtest-XWN_pU/test/node1/data2/keyspace1/standard1-ea6af260e79c11e5bc1783123b779c82/keyspace1-standard1-ka-3-Data.db ---------------- /tmp/dtest-XWN_pU/test/node1/data0/keyspace1/standard1-ea6af260e79c11e5bc1783123b779c82/keyspace1-standard1-ka-6-Data.db /tmp/dtest-XWN_pU/test/node1/data0/keyspace1/standard1-ea6af260e79c11e5bc1783123b779c82/keyspace1-standard1-ka-5-Data.db /tmp/dtest-XWN_pU/test/node1/data1/keyspace1/standard1-ea6af260e79c11e5bc1783123b779c82/keyspace1-standard1-ka-1-Data.db /tmp/dtest-XWN_pU/test/node1/data2/keyspace1/standard1-ea6af260e79c11e5bc1783123b779c82/keyspace1-standard1-ka-4-Data.db /tmp/dtest-XWN_pU/test/node1/data2/keyspace1/standard1-ea6af260e79c11e5bc1783123b779c82/keyspace1-standard1-ka-3-Data.db /tmp/dtest-XWN_pU/test/node1/data2/keyspace1/standard1-ea6af260e79c11e5bc1783123b779c82/keyspace1-standard1-tmp-ka-7-Data.db ---------------- /tmp/dtest-XWN_pU/test/node1/data0/keyspace1/standard1-ea6af260e79c11e5bc1783123b779c82/keyspace1-standard1-ka-6-Data.db /tmp/dtest-XWN_pU/test/node1/data0/keyspace1/standard1-ea6af260e79c11e5bc1783123b779c82/keyspace1-standard1-ka-5-Data.db /tmp/dtest-XWN_pU/test/node1/data1/keyspace1/standard1-ea6af260e79c11e5bc1783123b779c82/keyspace1-standard1-ka-1-Data.db /tmp/dtest-XWN_pU/test/node1/data2/keyspace1/standard1-ea6af260e79c11e5bc1783123b779c82/keyspace1-standard1-ka-7-Data.db /tmp/dtest-XWN_pU/test/node1/data2/keyspace1/standard1-ea6af260e79c11e5bc1783123b779c82/keyspace1-standard1-tmp-ka-8-Data.db /tmp/dtest-XWN_pU/test/node1/data2/keyspace1/standard1-ea6af260e79c11e5bc1783123b779c82/keyspace1-standard1-ka-3-Data.db ---------------- /tmp/dtest-XWN_pU/test/node1/data0/keyspace1/standard1-ea6af260e79c11e5bc1783123b779c82/keyspace1-standard1-ka-6-Data.db /tmp/dtest-XWN_pU/test/node1/data0/keyspace1/standard1-ea6af260e79c11e5bc1783123b779c82/keyspace1-standard1-ka-5-Data.db /tmp/dtest-XWN_pU/test/node1/data0/keyspace1/standard1-ea6af260e79c11e5bc1783123b779c82/keyspace1-standard1-tmp-ka-9-Data.db /tmp/dtest-XWN_pU/test/node1/data1/keyspace1/standard1-ea6af260e79c11e5bc1783123b779c82/keyspace1-standard1-ka-1-Data.db /tmp/dtest-XWN_pU/test/node1/data2/keyspace1/standard1-ea6af260e79c11e5bc1783123b779c82/keyspace1-standard1-ka-8-Data.db /tmp/dtest-XWN_pU/test/node1/data2/keyspace1/standard1-ea6af260e79c11e5bc1783123b779c82/keyspace1-standard1-ka-7-Data.db ---------------- /tmp/dtest-XWN_pU/test/node1/data0/keyspace1/standard1-ea6af260e79c11e5bc1783123b779c82/keyspace1-standard1-ka-6-Data.db /tmp/dtest-XWN_pU/test/node1/data0/keyspace1/standard1-ea6af260e79c11e5bc1783123b779c82/keyspace1-standard1-ka-5-Data.db /tmp/dtest-XWN_pU/test/node1/data0/keyspace1/standard1-ea6af260e79c11e5bc1783123b779c82/keyspace1-standard1-tmp-ka-9-Data.db /tmp/dtest-XWN_pU/test/node1/data1/keyspace1/standard1-ea6af260e79c11e5bc1783123b779c82/keyspace1-standard1-ka-1-Data.db /tmp/dtest-XWN_pU/test/node1/data2/keyspace1/standard1-ea6af260e79c11e5bc1783123b779c82/keyspace1-standard1-ka-8-Data.db /tmp/dtest-XWN_pU/test/node1/data2/keyspace1/standard1-ea6af260e79c11e5bc1783123b779c82/keyspace1-standard1-ka-7-Data.db ---------------- /tmp/dtest-XWN_pU/test/node1/data0/keyspace1/standard1-ea6af260e79c11e5bc1783123b779c82/keyspace1-standard1-ka-6-Data.db /tmp/dtest-XWN_pU/test/node1/data0/keyspace1/standard1-ea6af260e79c11e5bc1783123b779c82/keyspace1-standard1-ka-9-Data.db /tmp/dtest-XWN_pU/test/node1/data0/keyspace1/standard1-ea6af260e79c11e5bc1783123b779c82/keyspace1-standard1-ka-5-Data.db /tmp/dtest-XWN_pU/test/node1/data1/keyspace1/standard1-ea6af260e79c11e5bc1783123b779c82/keyspace1-standard1-tmp-ka-10-Data.db /tmp/dtest-XWN_pU/test/node1/data2/keyspace1/standard1-ea6af260e79c11e5bc1783123b779c82/keyspace1-standard1-ka-8-Data.db /tmp/dtest-XWN_pU/test/node1/data2/keyspace1/standard1-ea6af260e79c11e5bc1783123b779c82/keyspace1-standard1-ka-7-Data.db ---------------- /tmp/dtest-XWN_pU/test/node1/data0/keyspace1/standard1-ea6af260e79c11e5bc1783123b779c82/keyspace1-standard1-ka-6-Data.db /tmp/dtest-XWN_pU/test/node1/data0/keyspace1/standard1-ea6af260e79c11e5bc1783123b779c82/keyspace1-standard1-ka-9-Data.db /tmp/dtest-XWN_pU/test/node1/data1/keyspace1/standard1-ea6af260e79c11e5bc1783123b779c82/keyspace1-standard1-ka-10-Data.db /tmp/dtest-XWN_pU/test/node1/data2/keyspace1/standard1-ea6af260e79c11e5bc1783123b779c82/keyspace1-standard1-ka-8-Data.db /tmp/dtest-XWN_pU/test/node1/data2/keyspace1/standard1-ea6af260e79c11e5bc1783123b779c82/keyspace1-standard1-ka-7-Data.db ---------------- That is, only writing to a single file with a single compactor, and the old file is gone once the tmp file disappears. On 3.0 I get this: /tmp/dtest-50KYOT/test/node1/data0/keyspace1/standard1-5dfb68a0e79c11e58938d7db499c29d8/ma-2-big-Data.db /tmp/dtest-50KYOT/test/node1/data1/keyspace1/standard1-5dfb68a0e79c11e58938d7db499c29d8/ma-5-big-Data.db /tmp/dtest-50KYOT/test/node1/data1/keyspace1/standard1-5dfb68a0e79c11e58938d7db499c29d8/ma-3-big-Data.db /tmp/dtest-50KYOT/test/node1/data2/keyspace1/standard1-5dfb68a0e79c11e58938d7db499c29d8/ma-4-big-Data.db /tmp/dtest-50KYOT/test/node1/data2/keyspace1/standard1-5dfb68a0e79c11e58938d7db499c29d8/ma-1-big-Data.db ---------------- /tmp/dtest-50KYOT/test/node1/data0/keyspace1/standard1-5dfb68a0e79c11e58938d7db499c29d8/ma-2-big-Data.db /tmp/dtest-50KYOT/test/node1/data1/keyspace1/standard1-5dfb68a0e79c11e58938d7db499c29d8/ma-5-big-Data.db /tmp/dtest-50KYOT/test/node1/data1/keyspace1/standard1-5dfb68a0e79c11e58938d7db499c29d8/ma-3-big-Data.db /tmp/dtest-50KYOT/test/node1/data2/keyspace1/standard1-5dfb68a0e79c11e58938d7db499c29d8/ma-6-big-Data.db /tmp/dtest-50KYOT/test/node1/data2/keyspace1/standard1-5dfb68a0e79c11e58938d7db499c29d8/ma-4-big-Data.db /tmp/dtest-50KYOT/test/node1/data2/keyspace1/standard1-5dfb68a0e79c11e58938d7db499c29d8/ma-1-big-Data.db ---------------- /tmp/dtest-50KYOT/test/node1/data0/keyspace1/standard1-5dfb68a0e79c11e58938d7db499c29d8/ma-8-big-Data.db /tmp/dtest-50KYOT/test/node1/data0/keyspace1/standard1-5dfb68a0e79c11e58938d7db499c29d8/ma-2-big-Data.db /tmp/dtest-50KYOT/test/node1/data1/keyspace1/standard1-5dfb68a0e79c11e58938d7db499c29d8/ma-3-big-Data.db /tmp/dtest-50KYOT/test/node1/data2/keyspace1/standard1-5dfb68a0e79c11e58938d7db499c29d8/ma-6-big-Data.db /tmp/dtest-50KYOT/test/node1/data2/keyspace1/standard1-5dfb68a0e79c11e58938d7db499c29d8/ma-7-big-Data.db /tmp/dtest-50KYOT/test/node1/data2/keyspace1/standard1-5dfb68a0e79c11e58938d7db499c29d8/ma-1-big-Data.db ---------------- /tmp/dtest-50KYOT/test/node1/data0/keyspace1/standard1-5dfb68a0e79c11e58938d7db499c29d8/ma-8-big-Data.db /tmp/dtest-50KYOT/test/node1/data0/keyspace1/standard1-5dfb68a0e79c11e58938d7db499c29d8/ma-2-big-Data.db /tmp/dtest-50KYOT/test/node1/data1/keyspace1/standard1-5dfb68a0e79c11e58938d7db499c29d8/ma-9-big-Data.db /tmp/dtest-50KYOT/test/node1/data1/keyspace1/standard1-5dfb68a0e79c11e58938d7db499c29d8/ma-3-big-Data.db /tmp/dtest-50KYOT/test/node1/data2/keyspace1/standard1-5dfb68a0e79c11e58938d7db499c29d8/ma-6-big-Data.db /tmp/dtest-50KYOT/test/node1/data2/keyspace1/standard1-5dfb68a0e79c11e58938d7db499c29d8/ma-7-big-Data.db ---------------- /tmp/dtest-50KYOT/test/node1/data0/keyspace1/standard1-5dfb68a0e79c11e58938d7db499c29d8/ma-8-big-Data.db /tmp/dtest-50KYOT/test/node1/data1/keyspace1/standard1-5dfb68a0e79c11e58938d7db499c29d8/ma-9-big-Data.db /tmp/dtest-50KYOT/test/node1/data1/keyspace1/standard1-5dfb68a0e79c11e58938d7db499c29d8/ma-3-big-Data.db /tmp/dtest-50KYOT/test/node1/data2/keyspace1/standard1-5dfb68a0e79c11e58938d7db499c29d8/ma-6-big-Data.db /tmp/dtest-50KYOT/test/node1/data2/keyspace1/standard1-5dfb68a0e79c11e58938d7db499c29d8/ma-10-big-Data.db /tmp/dtest-50KYOT/test/node1/data2/keyspace1/standard1-5dfb68a0e79c11e58938d7db499c29d8/ma-7-big-Data.db ---------------- /tmp/dtest-50KYOT/test/node1/data0/keyspace1/standard1-5dfb68a0e79c11e58938d7db499c29d8/ma-8-big-Data.db /tmp/dtest-50KYOT/test/node1/data1/keyspace1/standard1-5dfb68a0e79c11e58938d7db499c29d8/ma-9-big-Data.db /tmp/dtest-50KYOT/test/node1/data2/keyspace1/standard1-5dfb68a0e79c11e58938d7db499c29d8/ma-6-big-Data.db /tmp/dtest-50KYOT/test/node1/data2/keyspace1/standard1-5dfb68a0e79c11e58938d7db499c29d8/ma-10-big-Data.db /tmp/dtest-50KYOT/test/node1/data2/keyspace1/standard1-5dfb68a0e79c11e58938d7db499c29d8/ma-7-big-Data.db ---------------- Filecount never goes above #original_files + 1 with one compactor. So, this issue is probably down to the fact that people might have 8 concurrent compactors and then we will quickly use more diskspace.

Marcus Eriksson added a comment - 14/Mar/16 13:03

patch to add an --seq option to scrub/upgradesstables/cleanup to only use a single thread for the operation

branch	testall	dtest
marcuse/11179	testall	dtest
marcuse/11179-2.2	testall	dtest
marcuse/11179-3.0	testall	dtest
marcuse/11179-3.5	testall	dtest
marcuse/11179-trunk	testall	dtest

carlyeks to review since we were poking this in ~~CASSANDRA-10829~~

Marcus Eriksson added a comment - 14/Mar/16 13:03 patch to add an --seq option to scrub/upgradesstables/cleanup to only use a single thread for the operation branch testall dtest marcuse/11179 testall dtest marcuse/11179-2.2 testall dtest marcuse/11179-3.0 testall dtest marcuse/11179-3.5 testall dtest marcuse/11179-trunk testall dtest carlyeks to review since we were poking this in CASSANDRA-10829

Marcus Eriksson added a comment - 14/Mar/16 13:40

... working on the failing tests

Marcus Eriksson added a comment - 14/Mar/16 13:40 ... working on the failing tests

T Jake Luciani added a comment - 14/Mar/16 13:57

It wouldn't be much of a change so I think you should make this an int vs a boolean. So you can constrain from 1-N of these at a time. Just block on N futures per iteration. right now it's 1 or ALL.

T Jake Luciani added a comment - 14/Mar/16 13:57 It wouldn't be much of a change so I think you should make this an int vs a boolean. So you can constrain from 1-N of these at a time. Just block on N futures per iteration. right now it's 1 or ALL.

Marcus Eriksson added a comment - 14/Mar/16 16:14

updated to use --jobs X or -j X and make it default to 2 threads

Marcus Eriksson added a comment - 14/Mar/16 16:14 updated to use --jobs X or -j X and make it default to 2 threads

Tom Hobbs added a comment - 14/Mar/16 16:20

+1 on defaulting to 2 threads. I like having the default be fairly safe.

Tom Hobbs added a comment - 14/Mar/16 16:20 +1 on defaulting to 2 threads. I like having the default be fairly safe.

Carl Yeksigian added a comment - 23/Mar/16 17:47

Looks good. Just a couple of comments:

Would be nice to add a comment to parallelAllSSTableOperation explaining that jobs = 0 means using all compactor threads, so that we remember to propagate that to our argument explanations.
Also, it's not clear what would happen if you specified a jobs higher than the number of concurrent compactors. The expectation is probably that it would override that selection, so either a warning or the inability to do that would be helpful.

Carl Yeksigian added a comment - 23/Mar/16 17:47 Looks good. Just a couple of comments: Would be nice to add a comment to parallelAllSSTableOperation explaining that jobs = 0 means using all compactor threads, so that we remember to propagate that to our argument explanations. Also, it's not clear what would happen if you specified a jobs higher than the number of concurrent compactors. The expectation is probably that it would override that selection, so either a warning or the inability to do that would be helpful.

Marcus Eriksson added a comment - 24/Mar/16 07:24

Rebased and pushed a new commit with the comments fixed to the repos above (It outputs a message if -j > concurrent_compactors)

Marcus Eriksson added a comment - 24/Mar/16 07:24 Rebased and pushed a new commit with the comments fixed to the repos above (It outputs a message if -j > concurrent_compactors)

Carl Yeksigian added a comment - 24/Mar/16 14:30

+1

Carl Yeksigian added a comment - 24/Mar/16 14:30 +1

Marcus Eriksson added a comment - 29/Mar/16 09:13

committed, thanks

Marcus Eriksson added a comment - 29/Mar/16 09:13 committed, thanks

Apache Cassandra

Parallel cleanup can lead to disk space exhaustion

Details

Description

Attachments

Activity

People

Dates