Description
I have recently moved from C* 2.1.x to C* 3.0.6. The setup is quite
heavy:
- 13 nodes with spinning disks
- ~120 GB of data per node
- 50% of CFs are LCS and have quite wide rows.
- 2/3 CFs with LCS have more than 200 SStables
Incremental repairs do not seem to play really well with that.
I have been running some tests node by node by using the -pr option:
nodetool -h localhost repair -pr keyscheme
and to my surprise the whole process takes quite some time (1 hour
minimum, 8 hours if I haven't been repairing for 5/6 days).
Yesterday I tried to run the command with the -seq option so to
decrease the number of simultanoues compactions. After a while
the node on which I was running the repair simply died during
the anticompaction phase with the following
exception in the logs.
ERROR [metrics-graphite-reporter-1-thread-1] 2016-05-25 21:54:21,868 ScheduledReporter.java:119 - RuntimeException thrown from GraphiteReporter#report. Exception was suppressed. java.lang.RuntimeException: Failed to list files in /data/cassandra/data/keyschema/columnfamily-3996ce80b7ac11e48a9b6776bf484396 at org.apache.cassandra.db.lifecycle.LogAwareFileLister.list(LogAwareFileLister.java:57) ~[apache-cassandra-3.0.6.jar:3.0.6] at org.apache.cassandra.db.lifecycle.LifecycleTransaction.getFiles(LifecycleTransaction.java:547) ~[apache-cassandra-3.0.6.jar:3.0.6] at org.apache.cassandra.db.Directories$SSTableLister.filter(Directories.java:691) ~[apache-cassandra-3.0.6.jar:3.0.6] at org.apache.cassandra.db.Directories$SSTableLister.listFiles(Directories.java:662) ~[apache-cassandra-3.0.6.jar:3.0.6] at org.apache.cassandra.db.Directories$TrueFilesSizeVisitor.<init>(Directories.java:981) ~[apache-cassandra-3.0.6.jar:3.0.6] at org.apache.cassandra.db.Directories.getTrueAllocatedSizeIn(Directories.java:893) ~[apache-cassandra-3.0.6.jar:3.0.6] at org.apache.cassandra.db.Directories.trueSnapshotsSize(Directories.java:883) ~[apache-cassandra-3.0.6.jar:3.0.6] at org.apache.cassandra.db.ColumnFamilyStore.trueSnapshotsSize(ColumnFamilyStore.java:2332) ~[apache-cassandra-3.0.6.jar:3.0.6] at org.apache.cassandra.metrics.TableMetrics$32.getValue(TableMetrics.java:637) ~[apache-cassandra-3.0.6.jar:3.0.6] at org.apache.cassandra.metrics.TableMetrics$32.getValue(TableMetrics.java:634) ~[apache-cassandra-3.0.6.jar:3.0.6] at com.codahale.metrics.graphite.GraphiteReporter.reportGauge(GraphiteReporter.java:281) ~[metrics-graphite-3.1.0.jar:3.1.0] at com.codahale.metrics.graphite.GraphiteReporter.report(GraphiteReporter.java:158) ~[metrics-graphite-3.1.0.jar:3.1.0] at com.codahale.metrics.ScheduledReporter.report(ScheduledReporter.java:162) ~[metrics-core-3.1.0.jar:3.1.0] at com.codahale.metrics.ScheduledReporter$1.run(ScheduledReporter.java:117) ~[metrics-core-3.1.0.jar:3.1.0] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [na:1.8.0_91] at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) [na:1.8.0_91] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) [na:1.8.0_91] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) [na:1.8.0_91] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [na:1.8.0_91] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_91] at java.lang.Thread.run(Thread.java:745) [na:1.8.0_91] Caused by: java.lang.RuntimeException: java.nio.file.FileSystemException: /data/cassandra/data/keyschema/columnfamily-3996ce80b7ac11e48a9b6776bf484396/ma_txn_anticompactionafterrepair_f20b50d0-22bd-11e6-970f-6f22464f4624.log: Too many open files at org.apache.cassandra.io.util.FileUtils.readLines(FileUtils.java:622) ~[apache-cassandra-3.0.6.jar:3.0.6] at java.util.stream.Collectors.lambda$toMap$58(Collectors.java:1321) ~[na:1.8.0_91] at java.util.stream.ReduceOps$3ReducingSink.accept(ReduceOps.java:169) ~[na:1.8.0_91] at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193) ~[na:1.8.0_91] at java.util.Iterator.forEachRemaining(Iterator.java:116) ~[na:1.8.0_91] at java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801) ~[na:1.8.0_91] at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481) ~[na:1.8.0_91] at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471) ~[na:1.8.0_91] at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708) ~[na:1.8.0_91] at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) ~[na:1.8.0_91] at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499) ~[na:1.8.0_91] at org.apache.cassandra.db.lifecycle.LogReplicaSet.readRecords(LogReplicaSet.java:105) ~[apache-cassandra-3.0.6.jar:3.0.6] at org.apache.cassandra.db.lifecycle.LogFile.verify(LogFile.java:132) ~[apache-cassandra-3.0.6.jar:3.0.6] at org.apache.cassandra.db.lifecycle.LogAwareFileLister.readTxnLog(LogAwareFileLister.java:113) ~[apache-cassandra-3.0.6.jar:3.0.6] at org.apache.cassandra.db.lifecycle.LogAwareFileLister.classifyFiles(LogAwareFileLister.java:106) ~[apache-cassandra-3.0.6.jar:3.0.6] at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:184) ~[na:1.8.0_91] at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175) ~[na:1.8.0_91] at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1374) ~[na:1.8.0_91] at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481) ~[na:1.8.0_91] at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471) ~[na:1.8.0_91] at java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:151) ~[na:1.8.0_91] at java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:174) ~[na:1.8.0_91] at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) ~[na:1.8.0_91] at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:418) ~[na:1.8.0_91] at org.apache.cassandra.db.lifecycle.LogAwareFileLister.innerList(LogAwareFileLister.java:75) ~[apache-cassandra-3.0.6.jar:3.0.6] at org.apache.cassandra.db.lifecycle.LogAwareFileLister.list(LogAwareFileLister.java:53) ~[apache-cassandra-3.0.6.jar:3.0.6] ... 20 common frames omitted Caused by: java.nio.file.FileSystemException: /data/cassandra/data/keyschema/columnfamily-3996ce80b7ac11e48a9b6776bf484396/ma_txn_anticompactionafterrepair_f20b50d0-22bd-11e6-970f-6f22464f4624.log: Too many open files at sun.nio.fs.UnixException.translateToIOException(UnixException.java:91) ~[na:1.8.0_91] at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102) ~[na:1.8.0_91] at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107) ~[na:1.8.0_91] at sun.nio.fs.UnixFileSystemProvider.newByteChannel(UnixFileSystemProvider.java:214) ~[na:1.8.0_91] at java.nio.file.Files.newByteChannel(Files.java:361) ~[na:1.8.0_91] at java.nio.file.Files.newByteChannel(Files.java:407) ~[na:1.8.0_91] at java.nio.file.spi.FileSystemProvider.newInputStream(FileSystemProvider.java:384) ~[na:1.8.0_91] at java.nio.file.Files.newInputStream(Files.java:152) ~[na:1.8.0_91] at java.nio.file.Files.newBufferedReader(Files.java:2784) ~[na:1.8.0_91] at java.nio.file.Files.readAllLines(Files.java:3202) ~[na:1.8.0_91] at org.apache.cassandra.io.util.FileUtils.readLines(FileUtils.java:615) ~[apache-cassandra-3.0.6.jar:3.0.6] ... 45 common frames omitted WARN [HintsWriteExecutor:1] 2016-05-25 21:54:28,389 CLibrary.java:256 - open(/var/lib/cassandra/hints, O_RDONLY) failed, errno (24). ERROR [CompactionExecutor:2660] 2016-05-25 21:54:29,915 JVMStabilityInspector.java:139 - JVM state determined to be unstable. Exiting forcefully due to: java.io.FileNotFoundException: /data/cassandra/data/keyschema/columnfamily-3996ce80b7ac11e48a9b6776bf484396/ma-92661-big-Filter.db (Too many open files) at java.io.FileOutputStream.open0(Native Method) ~[na:1.8.0_91] at java.io.FileOutputStream.open(FileOutputStream.java:270) ~[na:1.8.0_91] at java.io.FileOutputStream.<init>(FileOutputStream.java:213) ~[na:1.8.0_91] at java.io.FileOutputStream.<init>(FileOutputStream.java:101) ~[na:1.8.0_91] at org.apache.cassandra.io.sstable.format.big.BigTableWriter$IndexWriter.flushBf(BigTableWriter.java:431) ~[apache-cassandra-3.0.6.jar:3.0.6] at org.apache.cassandra.io.sstable.format.big.BigTableWriter$IndexWriter.doPrepare(BigTableWriter.java:461) ~[apache-cassandra-3.0.6.jar:3.0.6] at org.apache.cassandra.utils.concurrent.Transactional$AbstractTransactional.prepareToCommit(Transactional.java:173) ~[apache-cassandra-3.0.6.jar:3.0.6] at org.apache.cassandra.io.sstable.format.big.BigTableWriter$TransactionalProxy.doPrepare(BigTableWriter.java:303) ~[apache-cassandra-3.0.6.jar:3.0.6] at org.apache.cassandra.utils.concurrent.Transactional$AbstractTransactional.prepareToCommit(Transactional.java:173) ~[apache-cassandra-3.0.6.jar:3.0.6] at org.apache.cassandra.io.sstable.format.SSTableWriter.prepareToCommit(SSTableWriter.java:229) ~[apache-cassandra-3.0.6.jar:3.0.6] at org.apache.cassandra.io.sstable.SSTableRewriter.doPrepare(SSTableRewriter.java:357) ~[apache-cassandra-3.0.6.jar:3.0.6] at org.apache.cassandra.utils.concurrent.Transactional$AbstractTransactional.prepareToCommit(Transactional.java:173) ~[apache-cassandra-3.0.6.jar:3.0.6] at org.apache.cassandra.db.compaction.CompactionManager.antiCompactGroup(CompactionManager.java:1303) [apache-cassandra-3.0.6.jar:3.0.6] at org.apache.cassandra.db.compaction.CompactionManager.doAntiCompaction(CompactionManager.java:1232) [apache-cassandra-3.0.6.jar:3.0.6] at org.apache.cassandra.db.compaction.CompactionManager.performAnticompaction(CompactionManager.java:569) [apache-cassandra-3.0.6.jar:3.0.6] at org.apache.cassandra.db.compaction.CompactionManager$7.runMayThrow(CompactionManager.java:479) [apache-cassandra-3.0.6.jar:3.0.6] at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) [apache-cassandra-3.0.6.jar:3.0.6] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [na:1.8.0_91] at java.util.concurrent.FutureTask.run(FutureTask.java:266) [na:1.8.0_91] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [na:1.8.0_91] at java.util.concurrent.FutureTask.run(FutureTask.java:266) [na:1.8.0_91] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [na:1.8.0_91] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_91] at java.lang.Thread.run(Thread.java:745) [na:1.8.0_91]
Seems like that by using the -seq option, C* just opened wat too
many files. The limits are currently quite ok though, as showed below:
root@node3:/# cat /etc/security/limits.d/cassandra.conf # Provided by the cassandra package cassandra - memlock unlimited cassandra - nofile 100000 cassandra - as unlimited cassandra - nproc 8096
Today I checked the number of file opened by C* (admittedly while running
another repair without the -seq option):
pssh --inline-stdout -h cassandra-hosts.txt -l root -t 0 "lsof -n | grep java | grep cassandra | wc -l" [1] 14:29:43 [SUCCESS] node3 (where the repair is taking place) 25452 [2] 14:29:43 [SUCCESS] node4 46779 pssh --inline-stdout -h cassandra-hosts.txt -l root -t 0 "lsof -n | grep java | grep cassandra | grep 'REG' | wc -l" [1] 14:26:40 [SUCCESS] node3 (where the repair is taking place) 24020 [2] 14:26:41 [SUCCESS] node4 4050
Looks like that during a repair, lots of file descriptors on the repairing node
relate to regular files, while on all others the stark majority relate to memory
mapped files.
Nevertheless 25K is still way less then 100K (still high though).
(To be honest, when I started the repair nodetool logged it with "Starting repair
command #2". I didn't pay that much attention to that as the previous repair
had returned already for quite some time, but it might have been the case that
another repair was still lingering (weird if that was the case).
In conclusion, is this behavior normal? Shall I increase the limits, or is
AntiCompaction really playing badly with incremental repairs and LCS
(and quite overloaded to be honest) tables?
Attachments
Issue Links
- duplicates
-
CASSANDRA-11696 Incremental repairs can mark too many ranges as repaired
- Resolved
-
CASSANDRA-11739 Cache key references might cause OOM on incremental repair
- Resolved
- is duplicated by
-
CASSANDRA-11594 Too many open files on directories
- Resolved