Uploaded image for project: 'Cassandra'
  1. Cassandra
  2. CASSANDRA-11906

Unstable JVM due too many files when anticompacting big LCS tables

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Normal
    • Resolution: Duplicate
    • Fix Version/s: 3.0.x
    • Component/s: None
    • Labels:
    • Severity:
      Normal
    • Since Version:

      Description

      I have recently moved from C* 2.1.x to C* 3.0.6. The setup is quite
      heavy:

      • 13 nodes with spinning disks
      • ~120 GB of data per node
      • 50% of CFs are LCS and have quite wide rows.
      • 2/3 CFs with LCS have more than 200 SStables

      Incremental repairs do not seem to play really well with that.
      I have been running some tests node by node by using the -pr option:

      nodetool -h localhost repair -pr keyscheme
      

      and to my surprise the whole process takes quite some time (1 hour
      minimum, 8 hours if I haven't been repairing for 5/6 days).

      Yesterday I tried to run the command with the -seq option so to
      decrease the number of simultanoues compactions. After a while
      the node on which I was running the repair simply died during
      the anticompaction phase with the following
      exception in the logs.

      ERROR [metrics-graphite-reporter-1-thread-1] 2016-05-25 21:54:21,868 ScheduledReporter.java:119 - RuntimeException thrown from GraphiteReporter#report. Exception was suppressed.
      java.lang.RuntimeException: Failed to list files in /data/cassandra/data/keyschema/columnfamily-3996ce80b7ac11e48a9b6776bf484396
      	at org.apache.cassandra.db.lifecycle.LogAwareFileLister.list(LogAwareFileLister.java:57) ~[apache-cassandra-3.0.6.jar:3.0.6]
      	at org.apache.cassandra.db.lifecycle.LifecycleTransaction.getFiles(LifecycleTransaction.java:547) ~[apache-cassandra-3.0.6.jar:3.0.6]
      	at org.apache.cassandra.db.Directories$SSTableLister.filter(Directories.java:691) ~[apache-cassandra-3.0.6.jar:3.0.6]
      	at org.apache.cassandra.db.Directories$SSTableLister.listFiles(Directories.java:662) ~[apache-cassandra-3.0.6.jar:3.0.6]
      	at org.apache.cassandra.db.Directories$TrueFilesSizeVisitor.<init>(Directories.java:981) ~[apache-cassandra-3.0.6.jar:3.0.6]
      	at org.apache.cassandra.db.Directories.getTrueAllocatedSizeIn(Directories.java:893) ~[apache-cassandra-3.0.6.jar:3.0.6]
      	at org.apache.cassandra.db.Directories.trueSnapshotsSize(Directories.java:883) ~[apache-cassandra-3.0.6.jar:3.0.6]
      	at org.apache.cassandra.db.ColumnFamilyStore.trueSnapshotsSize(ColumnFamilyStore.java:2332) ~[apache-cassandra-3.0.6.jar:3.0.6]
      	at org.apache.cassandra.metrics.TableMetrics$32.getValue(TableMetrics.java:637) ~[apache-cassandra-3.0.6.jar:3.0.6]
      	at org.apache.cassandra.metrics.TableMetrics$32.getValue(TableMetrics.java:634) ~[apache-cassandra-3.0.6.jar:3.0.6]
      	at com.codahale.metrics.graphite.GraphiteReporter.reportGauge(GraphiteReporter.java:281) ~[metrics-graphite-3.1.0.jar:3.1.0]
      	at com.codahale.metrics.graphite.GraphiteReporter.report(GraphiteReporter.java:158) ~[metrics-graphite-3.1.0.jar:3.1.0]
      	at com.codahale.metrics.ScheduledReporter.report(ScheduledReporter.java:162) ~[metrics-core-3.1.0.jar:3.1.0]
      	at com.codahale.metrics.ScheduledReporter$1.run(ScheduledReporter.java:117) ~[metrics-core-3.1.0.jar:3.1.0]
      	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [na:1.8.0_91]
      	at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) [na:1.8.0_91]
      	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) [na:1.8.0_91]
      	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) [na:1.8.0_91]
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [na:1.8.0_91]
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_91]
      	at java.lang.Thread.run(Thread.java:745) [na:1.8.0_91]
      Caused by: java.lang.RuntimeException: java.nio.file.FileSystemException: /data/cassandra/data/keyschema/columnfamily-3996ce80b7ac11e48a9b6776bf484396/ma_txn_anticompactionafterrepair_f20b50d0-22bd-11e6-970f-6f22464f4624.log: Too many open files
      	at org.apache.cassandra.io.util.FileUtils.readLines(FileUtils.java:622) ~[apache-cassandra-3.0.6.jar:3.0.6]
      	at java.util.stream.Collectors.lambda$toMap$58(Collectors.java:1321) ~[na:1.8.0_91]
      	at java.util.stream.ReduceOps$3ReducingSink.accept(ReduceOps.java:169) ~[na:1.8.0_91]
      	at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193) ~[na:1.8.0_91]
      	at java.util.Iterator.forEachRemaining(Iterator.java:116) ~[na:1.8.0_91]
      	at java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801) ~[na:1.8.0_91]
      	at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481) ~[na:1.8.0_91]
      	at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471) ~[na:1.8.0_91]
      	at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708) ~[na:1.8.0_91]
      	at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) ~[na:1.8.0_91]
      	at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499) ~[na:1.8.0_91]
      	at org.apache.cassandra.db.lifecycle.LogReplicaSet.readRecords(LogReplicaSet.java:105) ~[apache-cassandra-3.0.6.jar:3.0.6]
      	at org.apache.cassandra.db.lifecycle.LogFile.verify(LogFile.java:132) ~[apache-cassandra-3.0.6.jar:3.0.6]
      	at org.apache.cassandra.db.lifecycle.LogAwareFileLister.readTxnLog(LogAwareFileLister.java:113) ~[apache-cassandra-3.0.6.jar:3.0.6]
      	at org.apache.cassandra.db.lifecycle.LogAwareFileLister.classifyFiles(LogAwareFileLister.java:106) ~[apache-cassandra-3.0.6.jar:3.0.6]
      	at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:184) ~[na:1.8.0_91]
      	at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175) ~[na:1.8.0_91]
      	at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1374) ~[na:1.8.0_91]
      	at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481) ~[na:1.8.0_91]
      	at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471) ~[na:1.8.0_91]
      	at java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:151) ~[na:1.8.0_91]
      	at java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:174) ~[na:1.8.0_91]
      	at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) ~[na:1.8.0_91]
      	at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:418) ~[na:1.8.0_91]
      	at org.apache.cassandra.db.lifecycle.LogAwareFileLister.innerList(LogAwareFileLister.java:75) ~[apache-cassandra-3.0.6.jar:3.0.6]
      	at org.apache.cassandra.db.lifecycle.LogAwareFileLister.list(LogAwareFileLister.java:53) ~[apache-cassandra-3.0.6.jar:3.0.6]
      	... 20 common frames omitted
      Caused by: java.nio.file.FileSystemException: /data/cassandra/data/keyschema/columnfamily-3996ce80b7ac11e48a9b6776bf484396/ma_txn_anticompactionafterrepair_f20b50d0-22bd-11e6-970f-6f22464f4624.log: Too many open files
      	at sun.nio.fs.UnixException.translateToIOException(UnixException.java:91) ~[na:1.8.0_91]
      	at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102) ~[na:1.8.0_91]
      	at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107) ~[na:1.8.0_91]
      	at sun.nio.fs.UnixFileSystemProvider.newByteChannel(UnixFileSystemProvider.java:214) ~[na:1.8.0_91]
      	at java.nio.file.Files.newByteChannel(Files.java:361) ~[na:1.8.0_91]
      	at java.nio.file.Files.newByteChannel(Files.java:407) ~[na:1.8.0_91]
      	at java.nio.file.spi.FileSystemProvider.newInputStream(FileSystemProvider.java:384) ~[na:1.8.0_91]
      	at java.nio.file.Files.newInputStream(Files.java:152) ~[na:1.8.0_91]
      	at java.nio.file.Files.newBufferedReader(Files.java:2784) ~[na:1.8.0_91]
      	at java.nio.file.Files.readAllLines(Files.java:3202) ~[na:1.8.0_91]
      	at org.apache.cassandra.io.util.FileUtils.readLines(FileUtils.java:615) ~[apache-cassandra-3.0.6.jar:3.0.6]
      	... 45 common frames omitted
      WARN  [HintsWriteExecutor:1] 2016-05-25 21:54:28,389 CLibrary.java:256 - open(/var/lib/cassandra/hints, O_RDONLY) failed, errno (24).
      ERROR [CompactionExecutor:2660] 2016-05-25 21:54:29,915 JVMStabilityInspector.java:139 - JVM state determined to be unstable.  Exiting forcefully due to:
      java.io.FileNotFoundException: /data/cassandra/data/keyschema/columnfamily-3996ce80b7ac11e48a9b6776bf484396/ma-92661-big-Filter.db (Too many open files)
      	at java.io.FileOutputStream.open0(Native Method) ~[na:1.8.0_91]
      	at java.io.FileOutputStream.open(FileOutputStream.java:270) ~[na:1.8.0_91]
      	at java.io.FileOutputStream.<init>(FileOutputStream.java:213) ~[na:1.8.0_91]
      	at java.io.FileOutputStream.<init>(FileOutputStream.java:101) ~[na:1.8.0_91]
      	at org.apache.cassandra.io.sstable.format.big.BigTableWriter$IndexWriter.flushBf(BigTableWriter.java:431) ~[apache-cassandra-3.0.6.jar:3.0.6]
      	at org.apache.cassandra.io.sstable.format.big.BigTableWriter$IndexWriter.doPrepare(BigTableWriter.java:461) ~[apache-cassandra-3.0.6.jar:3.0.6]
      	at org.apache.cassandra.utils.concurrent.Transactional$AbstractTransactional.prepareToCommit(Transactional.java:173) ~[apache-cassandra-3.0.6.jar:3.0.6]
      	at org.apache.cassandra.io.sstable.format.big.BigTableWriter$TransactionalProxy.doPrepare(BigTableWriter.java:303) ~[apache-cassandra-3.0.6.jar:3.0.6]
      	at org.apache.cassandra.utils.concurrent.Transactional$AbstractTransactional.prepareToCommit(Transactional.java:173) ~[apache-cassandra-3.0.6.jar:3.0.6]
      	at org.apache.cassandra.io.sstable.format.SSTableWriter.prepareToCommit(SSTableWriter.java:229) ~[apache-cassandra-3.0.6.jar:3.0.6]
      	at org.apache.cassandra.io.sstable.SSTableRewriter.doPrepare(SSTableRewriter.java:357) ~[apache-cassandra-3.0.6.jar:3.0.6]
      	at org.apache.cassandra.utils.concurrent.Transactional$AbstractTransactional.prepareToCommit(Transactional.java:173) ~[apache-cassandra-3.0.6.jar:3.0.6]
      	at org.apache.cassandra.db.compaction.CompactionManager.antiCompactGroup(CompactionManager.java:1303) [apache-cassandra-3.0.6.jar:3.0.6]
      	at org.apache.cassandra.db.compaction.CompactionManager.doAntiCompaction(CompactionManager.java:1232) [apache-cassandra-3.0.6.jar:3.0.6]
      	at org.apache.cassandra.db.compaction.CompactionManager.performAnticompaction(CompactionManager.java:569) [apache-cassandra-3.0.6.jar:3.0.6]
      	at org.apache.cassandra.db.compaction.CompactionManager$7.runMayThrow(CompactionManager.java:479) [apache-cassandra-3.0.6.jar:3.0.6]
      	at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) [apache-cassandra-3.0.6.jar:3.0.6]
      	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [na:1.8.0_91]
      	at java.util.concurrent.FutureTask.run(FutureTask.java:266) [na:1.8.0_91]
      	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [na:1.8.0_91]
      	at java.util.concurrent.FutureTask.run(FutureTask.java:266) [na:1.8.0_91]
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [na:1.8.0_91]
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_91]
      	at java.lang.Thread.run(Thread.java:745) [na:1.8.0_91]
      

      Seems like that by using the -seq option, C* just opened wat too
      many files. The limits are currently quite ok though, as showed below:

      root@node3:/# cat /etc/security/limits.d/cassandra.conf 
      # Provided by the cassandra package
      cassandra  -  memlock  unlimited
      cassandra  -  nofile   100000
      cassandra  -  as       unlimited
      cassandra  -  nproc    8096
      

      Today I checked the number of file opened by C* (admittedly while running
      another repair without the -seq option):

      pssh --inline-stdout -h cassandra-hosts.txt -l root -t 0 "lsof -n | grep java | grep cassandra | wc -l"
      [1] 14:29:43 [SUCCESS] node3 (where the repair is taking place)
      25452
      [2] 14:29:43 [SUCCESS] node4
      46779
      
      pssh --inline-stdout -h cassandra-hosts.txt -l root -t 0 "lsof -n | grep java | grep cassandra | grep 'REG' | wc -l"
      [1] 14:26:40 [SUCCESS] node3 (where the repair is taking place)
      24020
      [2] 14:26:41 [SUCCESS] node4
      4050
      

      Looks like that during a repair, lots of file descriptors on the repairing node
      relate to regular files, while on all others the stark majority relate to memory
      mapped files.

      Nevertheless 25K is still way less then 100K (still high though).

      (To be honest, when I started the repair nodetool logged it with "Starting repair
      command #2". I didn't pay that much attention to that as the previous repair
      had returned already for quite some time, but it might have been the case that
      another repair was still lingering (weird if that was the case).

      In conclusion, is this behavior normal? Shall I increase the limits, or is
      AntiCompaction really playing badly with incremental repairs and LCS
      (and quite overloaded to be honest) tables?

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                sean.mccarthy Sean McCarthy
                Reporter:
                ostefano Stefano Ortolani
                Authors:
                Sean McCarthy
              • Votes:
                0 Vote for this issue
                Watchers:
                10 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: