Cassandra
  1. Cassandra
  2. CASSANDRA-4021

CFS.scrubDataDirectories tries to delete nonexistent orphans

    Details

      Description

      The check only looks for a missing data file, then deletes all other components, however it's possible for the data file and another component to be missing, causing an error:

       WARN 17:19:28,765 Removing orphans for /var/lib/cassandra/data/system/HintsColumnFamily/system-HintsColumnFamily-hd-24492: [Index.db, Filter.db, Digest.sha1, Statistics.db, Data.db]
      ERROR 17:19:28,766 Exception encountered during startup
      java.lang.AssertionError: attempted to delete non-existing file system-HintsColumnFamily-hd-24492-Index.db
              at org.apache.cassandra.io.util.FileUtils.deleteWithConfirm(FileUtils.java:49)
              at org.apache.cassandra.io.util.FileUtils.deleteWithConfirm(FileUtils.java:44)
              at org.apache.cassandra.db.ColumnFamilyStore.scrubDataDirectories(ColumnFamilyStore.java:357)
              at org.apache.cassandra.service.AbstractCassandraDaemon.setup(AbstractCassandraDaemon.java:167)
              at org.apache.cassandra.service.AbstractCassandraDaemon.activate(AbstractCassandraDaemon.java:352)
              at org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:105)
      java.lang.AssertionError: attempted to delete non-existing file system-HintsColumnFamily-hd-24492-Index.db
              at org.apache.cassandra.io.util.FileUtils.deleteWithConfirm(FileUtils.java:49)
              at org.apache.cassandra.io.util.FileUtils.deleteWithConfirm(FileUtils.java:44)
              at org.apache.cassandra.db.ColumnFamilyStore.scrubDataDirectories(ColumnFamilyStore.java:357)
              at org.apache.cassandra.service.AbstractCassandraDaemon.setup(AbstractCassandraDaemon.java:167)
              at org.apache.cassandra.service.AbstractCassandraDaemon.activate(AbstractCassandraDaemon.java:352)
              at org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:105)
      Exception encountered during startup: attempted to delete non-existing file system-HintsColumnFamily-hd-24492-Index.db
      
      1. 4021.txt
        0.8 kB
        Brandon Williams
      2. node1.log
        23 kB
        Brandon Williams
      3. patch_4021.txt
        1 kB
        Jonathan Ellis

        Issue Links

          Activity

          Hide
          Brandon Williams added a comment -

          Apparently this goes all the way back to CASSANDRA-1471 but occurs very rarely. Patch to attempt deletion without caring about success.

          Show
          Brandon Williams added a comment - Apparently this goes all the way back to CASSANDRA-1471 but occurs very rarely. Patch to attempt deletion without caring about success.
          Hide
          Jonathan Ellis added a comment -

          The reasoning is that since the data component is written (renamed) last, the others should be there too if the data is.

          Show
          Jonathan Ellis added a comment - The reasoning is that since the data component is written (renamed) last, the others should be there too if the data is.
          Hide
          Brandon Williams added a comment -

          Aren't we actually firing this when the data does not exist, though?

                      if (components.contains(Component.DATA) && dataFile.length() > 0)
                          // everything appears to be in order... moving on.
                          continue;
          
                      // missing the DATA file! all components are orphaned
          
          Show
          Brandon Williams added a comment - Aren't we actually firing this when the data does not exist, though? if (components.contains(Component.DATA) && dataFile.length() > 0) // everything appears to be in order... moving on. continue ; // missing the DATA file! all components are orphaned
          Hide
          Jonathan Ellis added a comment -

          Not sure what you mean – since data file is renamed last, it's normal (if the process was killed mid-flush) for components to exist without a data file. But I can't think of a situation that would cause the inverse. Are we perhaps deleting things asynchronously somewhere?

          Show
          Jonathan Ellis added a comment - Not sure what you mean – since data file is renamed last, it's normal (if the process was killed mid-flush) for components to exist without a data file. But I can't think of a situation that would cause the inverse. Are we perhaps deleting things asynchronously somewhere?
          Hide
          Brandon Williams added a comment -

          I'm not sure how this happened, I tried to repro artificially and wasn't able to. Originally what happened is I was testing a patch that threw a TON of errors (all time was spent in logging), and after a ctrl-c and restart this happened.

          Is it really important to confirm the deletion here? Being unable to start rather sucks.

          Show
          Brandon Williams added a comment - I'm not sure how this happened, I tried to repro artificially and wasn't able to. Originally what happened is I was testing a patch that threw a TON of errors (all time was spent in logging), and after a ctrl-c and restart this happened. Is it really important to confirm the deletion here? Being unable to start rather sucks.
          Hide
          Sylvain Lebresne added a comment -

          What's weird is that the INDEX component was clearly found at the beginning of scrubDataDirectories, but didn't existed anymore when we tried the deletion. However, this happens during AbstractDaemon.setup() so I don't think there can be any concurrent process deleting the file. Anyway, I don't thing the delete confirmation is very important but I'm not fond of changing code when we don't understand what's going on. At least as long as nobody have reproduced without using a buggy patch initially.

          Show
          Sylvain Lebresne added a comment - What's weird is that the INDEX component was clearly found at the beginning of scrubDataDirectories, but didn't existed anymore when we tried the deletion. However, this happens during AbstractDaemon.setup() so I don't think there can be any concurrent process deleting the file. Anyway, I don't thing the delete confirmation is very important but I'm not fond of changing code when we don't understand what's going on. At least as long as nobody have reproduced without using a buggy patch initially.
          Hide
          Brandon Williams added a comment -

          FWIW, there has been another case of this reported, but it was on Windows.

          Show
          Brandon Williams added a comment - FWIW, there has been another case of this reported, but it was on Windows.
          Hide
          Jonathan Ellis added a comment -

          I don't suppose you have a log file for that case?

          Show
          Jonathan Ellis added a comment - I don't suppose you have a log file for that case?
          Hide
          Robert Coli added a comment -

          Trival repro case for something similar :

          1) create file named something-that-looks-like-a-sstable-but-isn't, for example the tablesnap-created filename "ObfuscatedCF-hd-8813-Statistics.db-listdir.json", which users of tablesnap are relatively likely to accidentally have in their datadir.

          2) stop node.

          3) start.

          This case seems like an issue with the sstable identifying code, real sstables names don't end with random strings like "-listdir.json".

          Show
          Robert Coli added a comment - Trival repro case for something similar : 1) create file named something-that-looks-like-a-sstable-but-isn't, for example the tablesnap-created filename "ObfuscatedCF-hd-8813-Statistics.db-listdir.json", which users of tablesnap are relatively likely to accidentally have in their datadir. 2) stop node. 3) start. This case seems like an issue with the sstable identifying code, real sstables names don't end with random strings like "-listdir.json".
          Hide
          Jonathan Ellis added a comment -

          My mind boggles that you appear to expect Cassandra to cope with arbitrary user-created files in the directories it is supposed to have control over.

          Show
          Jonathan Ellis added a comment - My mind boggles that you appear to expect Cassandra to cope with arbitrary user-created files in the directories it is supposed to have control over.
          Hide
          Robert Coli added a comment -

          I don't especially "expect" Cassandra to cope with arbitrary user-created files in the directories it is supposed to have control over. The purpose of my comment was primarily to assist any other operator who might have accidentally created such a file, who would then google the exception and be confused because this ticket was marked no-repro.

          However.. the comment for scrubDataDirectories says the following :
          "

          • Removes unnecessary files from the cf directory at startup: these include temp files, orphans, zero-length files
          • and compacted sstables. Files that cannot be recognized will be ignored.
            "

          So it is a goal to "recognize" files properly, and to "ignore" files that are not "recognized" properly.

          Further in the code we see..
          "
          if (!"snapshots".equals(name) && !"backups".equals(name) && !name.contains(".json"))
          "

          Which suggests that had my file not happened to have had suffix .json, it would have been "recognized" and at least logged an error about being an invalid file, even if it were not "ignored," I would have had a chance of reading a relevant log message ...

          I agree that the practice of creating arbitrary files named like sstables, but with an additional "-" in them should be considered hazardous!

          But as we can easily "recognize" that any file with more "-" delimited elements in them than possible are not sstables, I continue to suggest that the user might prefer to discover this before Cassandra has tried and failed to treat such a file as a sstable, and refused to start as a result of trying to scrub the broken sstable.

          Show
          Robert Coli added a comment - I don't especially "expect" Cassandra to cope with arbitrary user-created files in the directories it is supposed to have control over. The purpose of my comment was primarily to assist any other operator who might have accidentally created such a file, who would then google the exception and be confused because this ticket was marked no-repro. However.. the comment for scrubDataDirectories says the following : " Removes unnecessary files from the cf directory at startup: these include temp files, orphans, zero-length files and compacted sstables. Files that cannot be recognized will be ignored. " So it is a goal to "recognize" files properly, and to "ignore" files that are not "recognized" properly. Further in the code we see.. " if (!"snapshots".equals(name) && !"backups".equals(name) && !name.contains(".json")) " Which suggests that had my file not happened to have had suffix .json, it would have been "recognized" and at least logged an error about being an invalid file, even if it were not "ignored," I would have had a chance of reading a relevant log message ... I agree that the practice of creating arbitrary files named like sstables, but with an additional "-" in them should be considered hazardous! But as we can easily "recognize" that any file with more "-" delimited elements in them than possible are not sstables, I continue to suggest that the user might prefer to discover this before Cassandra has tried and failed to treat such a file as a sstable, and refused to start as a result of trying to scrub the broken sstable.
          Hide
          Boris Yen added a comment -

          We also see similar error message, but it is not about the 'index'. We are using 1.0.10.

          WARN [main] 2012-09-05 08:32:01,804 ColumnFamilyStore.java (line 383) Removing orphans for /opt/ruckuswireless/wsg/db/data/wsg/inventorySummary-hd-91: [Statistics.db, Filter.db, Digest.sha1, Data.db, Index.db]
          ERROR [main] 2012-09-05 08:32:01,805 AbstractCassandraDaemon.java (line 373) Exception encountered during startup
          java.lang.AssertionError: attempted to delete non-existing file inventorySummary-hd-91-Digest.sha1
          at org.apache.cassandra.io.util.FileUtils.deleteWithConfirm(FileUtils.java:49)
          at org.apache.cassandra.io.util.FileUtils.deleteWithConfirm(FileUtils.java:44)
          at org.apache.cassandra.db.ColumnFamilyStore.scrubDataDirectories(ColumnFamilyStore.java:388)
          at org.apache.cassandra.service.AbstractCassandraDaemon.setup(AbstractCassandraDaemon.java:193)
          at org.apache.cassandra.service.AbstractCassandraDaemon.activate(AbstractCassandraDaemon.java:356)
          at org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:107)

          I was wondering if anyone could confirm that this is the same issue.

          Show
          Boris Yen added a comment - We also see similar error message, but it is not about the 'index'. We are using 1.0.10. WARN [main] 2012-09-05 08:32:01,804 ColumnFamilyStore.java (line 383) Removing orphans for /opt/ruckuswireless/wsg/db/data/wsg/inventorySummary-hd-91: [Statistics.db, Filter.db, Digest.sha1, Data.db, Index.db] ERROR [main] 2012-09-05 08:32:01,805 AbstractCassandraDaemon.java (line 373) Exception encountered during startup java.lang.AssertionError: attempted to delete non-existing file inventorySummary-hd-91-Digest.sha1 at org.apache.cassandra.io.util.FileUtils.deleteWithConfirm(FileUtils.java:49) at org.apache.cassandra.io.util.FileUtils.deleteWithConfirm(FileUtils.java:44) at org.apache.cassandra.db.ColumnFamilyStore.scrubDataDirectories(ColumnFamilyStore.java:388) at org.apache.cassandra.service.AbstractCassandraDaemon.setup(AbstractCassandraDaemon.java:193) at org.apache.cassandra.service.AbstractCassandraDaemon.activate(AbstractCassandraDaemon.java:356) at org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:107) I was wondering if anyone could confirm that this is the same issue.
          Hide
          Brandon Williams added a comment -

          It looks similar, what kind of environment was this in?

          Show
          Brandon Williams added a comment - It looks similar, what kind of environment was this in?
          Hide
          Boris Yen added a comment - - edited

          OS:
          Linux TEST1 2.6.32.24 #1 SMP Wed Oct 3 14:51:26 CST 2012 x86_64 GNU/Linux

          Hardware:
          24 cores, 48G memory, 12G for heap.

          The first time we saw this was after the server was shutdown abruptly (due to the hardware issue). We have not seen the same issue for a while. However, it happened again just a few days ago. Unfortunately, we could not find a consistent way to reproduce it.

          Show
          Boris Yen added a comment - - edited OS: Linux TEST1 2.6.32.24 #1 SMP Wed Oct 3 14:51:26 CST 2012 x86_64 GNU/Linux Hardware: 24 cores, 48G memory, 12G for heap. The first time we saw this was after the server was shutdown abruptly (due to the hardware issue). We have not seen the same issue for a while. However, it happened again just a few days ago. Unfortunately, we could not find a consistent way to reproduce it.
          Hide
          Brandon Williams added a comment -

          dtestbot managed to randomly reproduce this morning. It looks like a race between compaction cleanup and forcible shutdown, then startup. Log attached.

          Show
          Brandon Williams added a comment - dtestbot managed to randomly reproduce this morning. It looks like a race between compaction cleanup and forcible shutdown, then startup. Log attached.
          Hide
          Boris Yen added a comment -

          I might be wrong. From the log, it looks like C* is started twice in a row. Is this a possible cause? Anyway to workaround this?

          Show
          Boris Yen added a comment - I might be wrong. From the log, it looks like C* is started twice in a row. Is this a possible cause? Anyway to workaround this?
          Hide
          Jonathan Ellis added a comment -

          That's odd, because here's the code causing that assertion:

          .           File dataFile = new File(desc.filenameFor(Component.DATA));
                      if (components.contains(Component.DATA) && dataFile.length() > 0)
                          // everything appears to be in order... moving on.
                          continue;
          
                      // missing the DATA file! all components are orphaned
                      logger.warn("Removing orphans for {}: {}", desc, components);
                      for (Component component : components)
                      {
                          FileUtils.deleteWithConfirm(desc.filenameFor(component));
                      }
          

          I must be missing something because these are the possibilities I see:

          1. .db exists and is non-empty. we don't try to delete it.
          2. .db exists and is empty. we delete it, and do not get a "file does not exist" failure
          3. .db does not exist (is not part of components), so we do not try to delete it
          Show
          Jonathan Ellis added a comment - That's odd, because here's the code causing that assertion: . File dataFile = new File(desc.filenameFor(Component.DATA)); if (components.contains(Component.DATA) && dataFile.length() > 0) // everything appears to be in order... moving on. continue ; // missing the DATA file! all components are orphaned logger.warn( "Removing orphans for {}: {}" , desc, components); for (Component component : components) { FileUtils.deleteWithConfirm(desc.filenameFor(component)); } I must be missing something because these are the possibilities I see: .db exists and is non-empty. we don't try to delete it. .db exists and is empty. we delete it, and do not get a "file does not exist" failure .db does not exist (is not part of components), so we do not try to delete it
          Hide
          Brandon Williams added a comment - - edited

          Saw this in the wild again. It was encountered multiple times on the same component set preventing startup, in the following order: CompressionInfo, CompressionInfo, Index, CompressionInfo, CompressionInfo. After that, it was finally able to startup so we weren't able to see which components existed or did not, but it looks like it does make incremental progress on the deletions with each restart. I don't want to paper over something without understanding what's going on here, but it's highly annoying and visible when this error is fatal. We aren't making progress here too quickly, so maybe in the interim we should add more logging around this and log something at ERROR instead of dying.

          Show
          Brandon Williams added a comment - - edited Saw this in the wild again. It was encountered multiple times on the same component set preventing startup, in the following order: CompressionInfo, CompressionInfo, Index, CompressionInfo, CompressionInfo. After that, it was finally able to startup so we weren't able to see which components existed or did not, but it looks like it does make incremental progress on the deletions with each restart. I don't want to paper over something without understanding what's going on here, but it's highly annoying and visible when this error is fatal. We aren't making progress here too quickly, so maybe in the interim we should add more logging around this and log something at ERROR instead of dying.
          Hide
          Stephen Pope added a comment -

          Saw this today on one of our test machine. Windows box with Cassandra 1.1.5. It would be nice to have it log and continue instead of puking. On the bright side, it does make progress every time upon startup and eventually starts fine. Haven't been able to reproduce it since. Log:

          WARN [main] 2013-01-24 13:01:50,480 ColumnFamilyStore.java (line 393) Removing orphans for C:\Program Files\Quest Software\MessageStats Business Insights\Storage\data\Doradus\MS_1_MessageParticipantPair_Terms\Doradus-MS_1_MessageParticipantPair_Terms-he-31: [Statistics.db, Filter.db, Digest.sha1, Data.db, Index.db]
          ERROR [main] 2013-01-24 13:01:50,480 AbstractCassandraDaemon.java (line 406) Exception encountered during startup
          java.lang.AssertionError: attempted to delete non-existing file Doradus-MS_1_MessageParticipantPair_Terms-he-31-Filter.db
          at org.apache.cassandra.io.util.FileUtils.deleteWithConfirm(FileUtils.java:68)
          at org.apache.cassandra.io.util.FileUtils.deleteWithConfirm(FileUtils.java:63)
          at org.apache.cassandra.db.ColumnFamilyStore.scrubDataDirectories(ColumnFamilyStore.java:398)
          at org.apache.cassandra.service.AbstractCassandraDaemon.setup(AbstractCassandraDaemon.java:196)
          at org.apache.cassandra.service.AbstractCassandraDaemon.activate(AbstractCassandraDaemon.java:389)
          at org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:106)
          WARN [main] 2013-01-24 13:35:32,656 ColumnFamilyStore.java (line 393) Removing orphans for C:\Program Files\Quest Software\MessageStats Business Insights\Storage\data\Doradus\MS_1_MessageParticipantPair_Terms\Doradus-MS_1_MessageParticipantPair_Terms-he-31: [Data.db, Digest.sha1, Filter.db, Index.db]
          ERROR [main] 2013-01-24 13:35:32,656 AbstractCassandraDaemon.java (line 406) Exception encountered during startup
          java.lang.AssertionError: attempted to delete non-existing file Doradus-MS_1_MessageParticipantPair_Terms-he-31-Data.db
          at org.apache.cassandra.io.util.FileUtils.deleteWithConfirm(FileUtils.java:68)
          at org.apache.cassandra.io.util.FileUtils.deleteWithConfirm(FileUtils.java:63)
          at org.apache.cassandra.db.ColumnFamilyStore.scrubDataDirectories(ColumnFamilyStore.java:398)
          at org.apache.cassandra.service.AbstractCassandraDaemon.setup(AbstractCassandraDaemon.java:196)
          at org.apache.cassandra.service.AbstractCassandraDaemon.activate(AbstractCassandraDaemon.java:389)
          at org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:106)
          WARN [main] 2013-01-24 13:35:44,559 ColumnFamilyStore.java (line 393) Removing orphans for C:\Program Files\Quest Software\MessageStats Business Insights\Storage\data\Doradus\MS_1_MessageParticipantPair_Terms\Doradus-MS_1_MessageParticipantPair_Terms-he-31: [Index.db, Digest.sha1, Data.db, Filter.db]
          ERROR [main] 2013-01-24 13:35:44,559 AbstractCassandraDaemon.java (line 406) Exception encountered during startup
          java.lang.AssertionError: attempted to delete non-existing file Doradus-MS_1_MessageParticipantPair_Terms-he-31-Data.db
          at org.apache.cassandra.io.util.FileUtils.deleteWithConfirm(FileUtils.java:68)
          at org.apache.cassandra.io.util.FileUtils.deleteWithConfirm(FileUtils.java:63)
          at org.apache.cassandra.db.ColumnFamilyStore.scrubDataDirectories(ColumnFamilyStore.java:398)
          at org.apache.cassandra.service.AbstractCassandraDaemon.setup(AbstractCassandraDaemon.java:196)
          at org.apache.cassandra.service.AbstractCassandraDaemon.activate(AbstractCassandraDaemon.java:389)
          at org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:106)

          Show
          Stephen Pope added a comment - Saw this today on one of our test machine. Windows box with Cassandra 1.1.5. It would be nice to have it log and continue instead of puking. On the bright side, it does make progress every time upon startup and eventually starts fine. Haven't been able to reproduce it since. Log: WARN [main] 2013-01-24 13:01:50,480 ColumnFamilyStore.java (line 393) Removing orphans for C:\Program Files\Quest Software\MessageStats Business Insights\Storage\data\Doradus\MS_1_MessageParticipantPair_Terms\Doradus-MS_1_MessageParticipantPair_Terms-he-31: [Statistics.db, Filter.db, Digest.sha1, Data.db, Index.db] ERROR [main] 2013-01-24 13:01:50,480 AbstractCassandraDaemon.java (line 406) Exception encountered during startup java.lang.AssertionError: attempted to delete non-existing file Doradus-MS_1_MessageParticipantPair_Terms-he-31-Filter.db at org.apache.cassandra.io.util.FileUtils.deleteWithConfirm(FileUtils.java:68) at org.apache.cassandra.io.util.FileUtils.deleteWithConfirm(FileUtils.java:63) at org.apache.cassandra.db.ColumnFamilyStore.scrubDataDirectories(ColumnFamilyStore.java:398) at org.apache.cassandra.service.AbstractCassandraDaemon.setup(AbstractCassandraDaemon.java:196) at org.apache.cassandra.service.AbstractCassandraDaemon.activate(AbstractCassandraDaemon.java:389) at org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:106) WARN [main] 2013-01-24 13:35:32,656 ColumnFamilyStore.java (line 393) Removing orphans for C:\Program Files\Quest Software\MessageStats Business Insights\Storage\data\Doradus\MS_1_MessageParticipantPair_Terms\Doradus-MS_1_MessageParticipantPair_Terms-he-31: [Data.db, Digest.sha1, Filter.db, Index.db] ERROR [main] 2013-01-24 13:35:32,656 AbstractCassandraDaemon.java (line 406) Exception encountered during startup java.lang.AssertionError: attempted to delete non-existing file Doradus-MS_1_MessageParticipantPair_Terms-he-31-Data.db at org.apache.cassandra.io.util.FileUtils.deleteWithConfirm(FileUtils.java:68) at org.apache.cassandra.io.util.FileUtils.deleteWithConfirm(FileUtils.java:63) at org.apache.cassandra.db.ColumnFamilyStore.scrubDataDirectories(ColumnFamilyStore.java:398) at org.apache.cassandra.service.AbstractCassandraDaemon.setup(AbstractCassandraDaemon.java:196) at org.apache.cassandra.service.AbstractCassandraDaemon.activate(AbstractCassandraDaemon.java:389) at org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:106) WARN [main] 2013-01-24 13:35:44,559 ColumnFamilyStore.java (line 393) Removing orphans for C:\Program Files\Quest Software\MessageStats Business Insights\Storage\data\Doradus\MS_1_MessageParticipantPair_Terms\Doradus-MS_1_MessageParticipantPair_Terms-he-31: [Index.db, Digest.sha1, Data.db, Filter.db] ERROR [main] 2013-01-24 13:35:44,559 AbstractCassandraDaemon.java (line 406) Exception encountered during startup java.lang.AssertionError: attempted to delete non-existing file Doradus-MS_1_MessageParticipantPair_Terms-he-31-Data.db at org.apache.cassandra.io.util.FileUtils.deleteWithConfirm(FileUtils.java:68) at org.apache.cassandra.io.util.FileUtils.deleteWithConfirm(FileUtils.java:63) at org.apache.cassandra.db.ColumnFamilyStore.scrubDataDirectories(ColumnFamilyStore.java:398) at org.apache.cassandra.service.AbstractCassandraDaemon.setup(AbstractCassandraDaemon.java:196) at org.apache.cassandra.service.AbstractCassandraDaemon.activate(AbstractCassandraDaemon.java:389) at org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:106)
          Hide
          Boris Yen added a comment - - edited

          We saw this issue again on 1.0.10. It seems this happens when the file-name-change after compaction is not completed. The following are the sstables on our file system and the error we saw (the -ea option is removed):

          rw-rr- 1 byan admin 66305 Mar 5 10:07 indexBytes-hd-111-Data.db
          rw-rr- 1 byan admin 67 Mar 5 10:07 indexBytes-hd-111-Digest.sha1
          rw-rr- 1 byan admin 1936 Mar 5 10:07 indexBytes-hd-111-Filter.db
          rw-rr- 1 byan admin 333 Mar 5 10:07 indexBytes-hd-111-Index.db
          rw-rr- 1 byan admin 4336 Mar 5 10:07 indexBytes-hd-111-Statistics.db
          rw-rr- 1 byan admin 240 Mar 5 12:50 indexBytes-hd-112-Data.db
          rw-rr- 1 byan admin 67 Mar 5 12:50 indexBytes-hd-112-Digest.sha1
          rw-rr- 1 byan admin 16 Mar 5 12:50 indexBytes-hd-112-Filter.db
          rw-rr- 1 byan admin 32 Mar 5 12:50 indexBytes-hd-112-Index.db
          rw-rr- 1 byan admin 4336 Mar 5 12:50 indexBytes-hd-112-Statistics.db
          rw-rr- 1 byan admin 240 Mar 5 15:18 indexBytes-hd-113-Data.db
          rw-rr- 1 byan admin 67 Mar 5 15:18 indexBytes-hd-113-Digest.sha1
          rw-rr- 1 byan admin 16 Mar 5 15:18 indexBytes-hd-113-Filter.db
          rw-rr- 1 byan admin 32 Mar 5 15:18 indexBytes-hd-113-Index.db
          rw-rr- 1 byan admin 4336 Mar 5 15:18 indexBytes-hd-113-Statistics.db
          rw-rr- 1 byan admin 120 Mar 5 15:26 indexBytes-hd-114-Data.db
          rw-rr- 1 byan admin 67 Mar 5 15:26 indexBytes-hd-114-Digest.sha1
          rw-rr- 1 byan admin 16 Mar 5 15:26 indexBytes-hd-114-Filter.db
          rw-rr- 1 byan admin 16 Mar 5 15:26 indexBytes-hd-114-Index.db
          rw-rr- 1 byan admin 4336 Mar 5 15:26 indexBytes-hd-114-Statistics.db
          rw-rr- 1 byan admin 333 Mar 5 15:26 indexBytes-hd-115-Index.db
          rw-rr- 1 byan admin 66305 Mar 5 15:26 indexBytes-tmp-hd-115-Data.db
          rw-rr- 1 byan admin 67 Mar 5 15:26 indexBytes-tmp-hd-115-Digest.sha1
          rw-rr- 1 byan admin 1936 Mar 5 15:26 indexBytes-tmp-hd-115-Filter.db

          WARN [main] 2013-03-05 10:43:22,494 ColumnFamilyStore.java (line 383) Removing orphans for /opt/test/tt/indexBytes-hd-115: [Digest.sha1, Index.db, Filter.db, Data.db]
          ERROR [main] 2013-03-05 10:43:22,494 AbstractCassandraDaemon.java (line 373) Exception encountered during startup
          java.io.IOError: java.io.IOException: Failed to delete /opt/test/tt/indexBytes-hd-115-Digest.sha1
          at org.apache.cassandra.db.ColumnFamilyStore.scrubDataDirectories(ColumnFamilyStore.java:392)
          at org.apache.cassandra.service.AbstractCassandraDaemon.setup(AbstractCassandraDaemon.java:193)
          at org.apache.cassandra.service.AbstractCassandraDaemon.activate(AbstractCassandraDaemon.java:356)
          at org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:107)
          Caused by: java.io.IOException: Failed to delete /opt/ruckuswireless/wsg/db/data/wsg/indexBytes-hd-115-Digest.sha1
          at org.apache.cassandra.io.util.FileUtils.deleteWithConfirm(FileUtils.java:54)
          at org.apache.cassandra.io.util.FileUtils.deleteWithConfirm(FileUtils.java:44)
          at org.apache.cassandra.db.ColumnFamilyStore.scrubDataDirectories(ColumnFamilyStore.java:388)
          ... 3 more

          It seems the "Descriptor" does not take the "temporary" flag into account in the "hash" and "equals" function. I was wondering if anyone could confirm this.

          Show
          Boris Yen added a comment - - edited We saw this issue again on 1.0.10. It seems this happens when the file-name-change after compaction is not completed. The following are the sstables on our file system and the error we saw (the -ea option is removed): rw-r r - 1 byan admin 66305 Mar 5 10:07 indexBytes-hd-111-Data.db rw-r r - 1 byan admin 67 Mar 5 10:07 indexBytes-hd-111-Digest.sha1 rw-r r - 1 byan admin 1936 Mar 5 10:07 indexBytes-hd-111-Filter.db rw-r r - 1 byan admin 333 Mar 5 10:07 indexBytes-hd-111-Index.db rw-r r - 1 byan admin 4336 Mar 5 10:07 indexBytes-hd-111-Statistics.db rw-r r - 1 byan admin 240 Mar 5 12:50 indexBytes-hd-112-Data.db rw-r r - 1 byan admin 67 Mar 5 12:50 indexBytes-hd-112-Digest.sha1 rw-r r - 1 byan admin 16 Mar 5 12:50 indexBytes-hd-112-Filter.db rw-r r - 1 byan admin 32 Mar 5 12:50 indexBytes-hd-112-Index.db rw-r r - 1 byan admin 4336 Mar 5 12:50 indexBytes-hd-112-Statistics.db rw-r r - 1 byan admin 240 Mar 5 15:18 indexBytes-hd-113-Data.db rw-r r - 1 byan admin 67 Mar 5 15:18 indexBytes-hd-113-Digest.sha1 rw-r r - 1 byan admin 16 Mar 5 15:18 indexBytes-hd-113-Filter.db rw-r r - 1 byan admin 32 Mar 5 15:18 indexBytes-hd-113-Index.db rw-r r - 1 byan admin 4336 Mar 5 15:18 indexBytes-hd-113-Statistics.db rw-r r - 1 byan admin 120 Mar 5 15:26 indexBytes-hd-114-Data.db rw-r r - 1 byan admin 67 Mar 5 15:26 indexBytes-hd-114-Digest.sha1 rw-r r - 1 byan admin 16 Mar 5 15:26 indexBytes-hd-114-Filter.db rw-r r - 1 byan admin 16 Mar 5 15:26 indexBytes-hd-114-Index.db rw-r r - 1 byan admin 4336 Mar 5 15:26 indexBytes-hd-114-Statistics.db rw-r r - 1 byan admin 333 Mar 5 15:26 indexBytes-hd-115-Index.db rw-r r - 1 byan admin 66305 Mar 5 15:26 indexBytes-tmp-hd-115-Data.db rw-r r - 1 byan admin 67 Mar 5 15:26 indexBytes-tmp-hd-115-Digest.sha1 rw-r r - 1 byan admin 1936 Mar 5 15:26 indexBytes-tmp-hd-115-Filter.db WARN [main] 2013-03-05 10:43:22,494 ColumnFamilyStore.java (line 383) Removing orphans for /opt/test/tt/indexBytes-hd-115: [Digest.sha1, Index.db, Filter.db, Data.db] ERROR [main] 2013-03-05 10:43:22,494 AbstractCassandraDaemon.java (line 373) Exception encountered during startup java.io.IOError: java.io.IOException: Failed to delete /opt/test/tt/indexBytes-hd-115-Digest.sha1 at org.apache.cassandra.db.ColumnFamilyStore.scrubDataDirectories(ColumnFamilyStore.java:392) at org.apache.cassandra.service.AbstractCassandraDaemon.setup(AbstractCassandraDaemon.java:193) at org.apache.cassandra.service.AbstractCassandraDaemon.activate(AbstractCassandraDaemon.java:356) at org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:107) Caused by: java.io.IOException: Failed to delete /opt/ruckuswireless/wsg/db/data/wsg/indexBytes-hd-115-Digest.sha1 at org.apache.cassandra.io.util.FileUtils.deleteWithConfirm(FileUtils.java:54) at org.apache.cassandra.io.util.FileUtils.deleteWithConfirm(FileUtils.java:44) at org.apache.cassandra.db.ColumnFamilyStore.scrubDataDirectories(ColumnFamilyStore.java:388) ... 3 more It seems the "Descriptor" does not take the "temporary" flag into account in the "hash" and "equals" function. I was wondering if anyone could confirm this.
          Hide
          Jonathan Ellis added a comment -

          Committed Boris's patch from the dev list (attached).

          Show
          Jonathan Ellis added a comment - Committed Boris's patch from the dev list (attached).

            People

            • Assignee:
              Boris Yen
              Reporter:
              Brandon Williams
              Reviewer:
              Jonathan Ellis
            • Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development