Uploaded image for project: 'Cassandra'
  1. Cassandra
  2. CASSANDRA-10534

CompressionInfo not being fsynced on close

    XMLWordPrintableJSON

Details

    • Normal

    Description

      I was seeing SSTable corruption due to a CompressionInfo.db file of size 0, this happened multiple times in our testing with hard node reboots. After some investigation it seems like these file is not being fsynced, and that can potentially lead to data corruption. I am working with version 2.1.9.

      I checked for fsync calls using strace, and found them happening for all but the following components: CompressionInfo, TOC.txt and digest.sha1. All of these but the CompressionInfo seem tolerable. Also a quick look through the code did not reveal any fsync calls. Moreover, I suspect the commit 4e95953f29d89a441dfe06d3f0393ed7dd8586df (https://github.com/apache/cassandra/commit/4e95953f29d89a441dfe06d3f0393ed7dd8586df#diff-b7e48a1398e39a936c11d0397d5d1966R344) has caused the regression, which removed the line

       getChannel().force(true);
      

      from CompressionMetadata.Writer.close.

      Following is the trace I saw in system.log:

      INFO  [SSTableBatchOpen:1] 2015-09-29 19:24:39,170 SSTableReader.java:478 - Opening /var/lib/cassandra/data/system/compactions_in_progress-55080ab05d9c388690a4acb25fe1f77b/system-compactions_in_progress-ka-13368 (79 bytes)
      ERROR [SSTableBatchOpen:1] 2015-09-29 19:24:39,177 FileUtils.java:447 - Exiting forcefully due to file system exception on startup, disk failure policy "stop"
      org.apache.cassandra.io.sstable.CorruptSSTableException: java.io.EOFException
              at org.apache.cassandra.io.compress.CompressionMetadata.<init>(CompressionMetadata.java:131) ~[apache-cassandra-2.1.9.jar:2.1.9]
              at org.apache.cassandra.io.compress.CompressionMetadata.create(CompressionMetadata.java:85) ~[apache-cassandra-2.1.9.jar:2.1.9]
              at org.apache.cassandra.io.util.CompressedSegmentedFile$Builder.metadata(CompressedSegmentedFile.java:79) ~[apache-cassandra-2.1.9.jar:2.1.9]
              at org.apache.cassandra.io.util.CompressedPoolingSegmentedFile$Builder.complete(CompressedPoolingSegmentedFile.java:72) ~[apache-cassandra-2.1.9.jar:2.1.9]
              at org.apache.cassandra.io.util.SegmentedFile$Builder.complete(SegmentedFile.java:168) ~[apache-cassandra-2.1.9.jar:2.1.9]
              at org.apache.cassandra.io.sstable.SSTableReader.load(SSTableReader.java:752) ~[apache-cassandra-2.1.9.jar:2.1.9]
              at org.apache.cassandra.io.sstable.SSTableReader.load(SSTableReader.java:703) ~[apache-cassandra-2.1.9.jar:2.1.9]
              at org.apache.cassandra.io.sstable.SSTableReader.open(SSTableReader.java:491) ~[apache-cassandra-2.1.9.jar:2.1.9]
              at org.apache.cassandra.io.sstable.SSTableReader.open(SSTableReader.java:387) ~[apache-cassandra-2.1.9.jar:2.1.9]
              at org.apache.cassandra.io.sstable.SSTableReader$4.run(SSTableReader.java:534) ~[apache-cassandra-2.1.9.jar:2.1.9]
              at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) [na:1.7.0_80]
              at java.util.concurrent.FutureTask.run(FutureTask.java:262) [na:1.7.0_80]
              at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [na:1.7.0_80]
              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_80]
              at java.lang.Thread.run(Thread.java:745) [na:1.7.0_80]
      Caused by: java.io.EOFException: null
              at java.io.DataInputStream.readUnsignedShort(DataInputStream.java:340) ~[na:1.7.0_80]
              at java.io.DataInputStream.readUTF(DataInputStream.java:589) ~[na:1.7.0_80]
              at java.io.DataInputStream.readUTF(DataInputStream.java:564) ~[na:1.7.0_80]
              at org.apache.cassandra.io.compress.CompressionMetadata.<init>(CompressionMetadata.java:106) ~[apache-cassandra-2.1.9.jar:2.1.9]
              ... 14 common frames omitted
      

      Following is the result of ls on the data directory of a corrupted SSTable after the hard reboot:

      $ ls -l /var/lib/cassandra/data/system/sstable_activity-5a1ff267ace03f128563cfae6103c65e/
      total 60
      -rw-r--r-- 1 cassandra cassandra     0 Oct 15 09:31 system-sstable_activity-ka-1-CompressionInfo.db
      -rw-r--r-- 1 cassandra cassandra  9740 Oct 15 09:31 system-sstable_activity-ka-1-Data.db
      -rw-r--r-- 1 cassandra cassandra     0 Oct 15 09:31 system-sstable_activity-ka-1-Digest.sha1
      -rw-r--r-- 1 cassandra cassandra   880 Oct 15 09:31 system-sstable_activity-ka-1-Filter.db
      -rw-r--r-- 1 cassandra cassandra 34000 Oct 15 09:31 system-sstable_activity-ka-1-Index.db
      -rw-r--r-- 1 cassandra cassandra  7338 Oct 15 09:31 system-sstable_activity-ka-1-Statistics.db
      -rw-r--r-- 1 cassandra cassandra     0 Oct 15 09:31 system-sstable_activity-ka-1-TOC.txt
      

      Attachments

        Issue Links

          Activity

            People

              stefania Stefania Alborghetti
              sharvanath Sharvanath Pathak
              Stefania Alborghetti
              Ariel Weisberg
              Votes:
              1 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: