Uploaded image for project: 'Cassandra'
  1. Cassandra
  2. CASSANDRA-3306

Failed streaming may cause duplicate SSTable reference

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Fix Version/s: 1.1.7, 1.2.0 beta 2
    • Component/s: None
    • Labels:
      None

      Description

      during stress testing, i always get this error making leveledcompaction strategy unusable. Should be easy to reproduce - just write fast.

      ERROR [CompactionExecutor:6] 2011-10-04 15:48:52,179 AbstractCassandraDaemon.java (line 133) Fatal exception in thread Thread[CompactionExecutor:6,5,main]
      java.lang.AssertionError
      at org.apache.cassandra.db.DataTracker$View.newSSTables(DataTracker.java:580)
      at org.apache.cassandra.db.DataTracker$View.replace(DataTracker.java:546)
      at org.apache.cassandra.db.DataTracker.replace(DataTracker.java:268)
      at org.apache.cassandra.db.DataTracker.replaceCompactedSSTables(DataTracker.java:232)
      at org.apache.cassandra.db.ColumnFamilyStore.replaceCompactedSSTables(ColumnFamilyStore.java:960)
      at org.apache.cassandra.db.compaction.CompactionTask.execute(CompactionTask.java:199)
      at org.apache.cassandra.db.compaction.LeveledCompactionTask.execute(LeveledCompactionTask.java:47)
      at org.apache.cassandra.db.compaction.CompactionManager$1.call(CompactionManager.java:131)
      at org.apache.cassandra.db.compaction.CompactionManager$1.call(CompactionManager.java:114)
      at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
      at java.util.concurrent.FutureTask.run(FutureTask.java:138)
      at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
      at java.lang.Thread.run(Thread.java:662)

      and this is in json data for table:

      {
      "generations" : [

      { "generation" : 0, "members" : [ 459, 460, 461, 462, 463, 464, 465, 466, 467, 468, 469, 470, 471, 472, 473, 474, 475, 476, 477, 478, 479, 480, 481, 482, 483, 484 ] }

      ,

      { "generation" : 1, "members" : [ ] }

      ,

      { "generation" : 2, "members" : [ ] }

      ,

      { "generation" : 3, "members" : [ ] }

      ,

      { "generation" : 4, "members" : [ ] }

      ,

      { "generation" : 5, "members" : [ ] }

      ,

      { "generation" : 6, "members" : [ ] }

      ,

      { "generation" : 7, "members" : [ ] }

      ]
      }

        Activity

        Hide
        hsn Radim Kolar added a comment -

        another problem. why not store data in some system CF? would be probably safer choice.

        ERROR [CompactionExecutor:5] 2011-10-04 17:13:13,922 AbstractCassandraDaemon.java (line 133) Fatal exception in thread Thread[CompactionExecutor:5,5,main]
        java.io.IOError: java.io.IOException: Failed to rename \var\lib\cassandra\data\test\sipdb.json to \var\lib\cassandra\data\test\sipdb-old.json
        at org.apache.cassandra.db.compaction.LeveledManifest.serialize(LeveledManifest.java:382)
        at org.apache.cassandra.db.compaction.LeveledManifest.promote(LeveledManifest.java:182)
        at org.apache.cassandra.db.compaction.LeveledCompactionStrategy.handleNotification(LeveledCompactionStrategy.java:152)
        at org.apache.cassandra.db.DataTracker.notifySSTablesChanged(DataTracker.java:466)
        at org.apache.cassandra.db.DataTracker.replace(DataTracker.java:275)
        at org.apache.cassandra.db.DataTracker.replaceCompactedSSTables(DataTracker.java:232)
        at org.apache.cassandra.db.ColumnFamilyStore.replaceCompactedSSTables(ColumnFamilyStore.java:960)
        at org.apache.cassandra.db.compaction.CompactionTask.execute(CompactionTask.java:199)
        at org.apache.cassandra.db.compaction.LeveledCompactionTask.execute(LeveledCompactionTask.java:47)
        at org.apache.cassandra.db.compaction.CompactionManager$1.call(CompactionManager.java:131)
        at org.apache.cassandra.db.compaction.CompactionManager$1.call(CompactionManager.java:114)
        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
        at java.util.concurrent.FutureTask.run(FutureTask.java:138)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:662)
        Caused by: java.io.IOException: Failed to rename \var\lib\cassandra\data\test\sipdb.json to \var\lib\cassandra\data\test\sipdb-old.json
        at org.apache.cassandra.io.util.FileUtils.renameWithConfirm(FileUtils.java:64)
        at org.apache.cassandra.db.compaction.LeveledManifest.serialize(LeveledManifest.java:375)
        ... 15 more

        Show
        hsn Radim Kolar added a comment - another problem. why not store data in some system CF? would be probably safer choice. ERROR [CompactionExecutor:5] 2011-10-04 17:13:13,922 AbstractCassandraDaemon.java (line 133) Fatal exception in thread Thread [CompactionExecutor:5,5,main] java.io.IOError: java.io.IOException: Failed to rename \var\lib\cassandra\data\test\sipdb.json to \var\lib\cassandra\data\test\sipdb-old.json at org.apache.cassandra.db.compaction.LeveledManifest.serialize(LeveledManifest.java:382) at org.apache.cassandra.db.compaction.LeveledManifest.promote(LeveledManifest.java:182) at org.apache.cassandra.db.compaction.LeveledCompactionStrategy.handleNotification(LeveledCompactionStrategy.java:152) at org.apache.cassandra.db.DataTracker.notifySSTablesChanged(DataTracker.java:466) at org.apache.cassandra.db.DataTracker.replace(DataTracker.java:275) at org.apache.cassandra.db.DataTracker.replaceCompactedSSTables(DataTracker.java:232) at org.apache.cassandra.db.ColumnFamilyStore.replaceCompactedSSTables(ColumnFamilyStore.java:960) at org.apache.cassandra.db.compaction.CompactionTask.execute(CompactionTask.java:199) at org.apache.cassandra.db.compaction.LeveledCompactionTask.execute(LeveledCompactionTask.java:47) at org.apache.cassandra.db.compaction.CompactionManager$1.call(CompactionManager.java:131) at org.apache.cassandra.db.compaction.CompactionManager$1.call(CompactionManager.java:114) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Caused by: java.io.IOException: Failed to rename \var\lib\cassandra\data\test\sipdb.json to \var\lib\cassandra\data\test\sipdb-old.json at org.apache.cassandra.io.util.FileUtils.renameWithConfirm(FileUtils.java:64) at org.apache.cassandra.db.compaction.LeveledManifest.serialize(LeveledManifest.java:375) ... 15 more
        Hide
        jbellis Jonathan Ellis added a comment -

        Because then you get into hairy cyclical situations where you can't read the manifest until you replay the commitlog, but replaying the commitlog requires writing new sstables and thus knowing the manifest

        Show
        jbellis Jonathan Ellis added a comment - Because then you get into hairy cyclical situations where you can't read the manifest until you replay the commitlog, but replaying the commitlog requires writing new sstables and thus knowing the manifest
        Hide
        hsn Radim Kolar added a comment -

        as i understand new flushed tables are placed at level 0. Just replay commitlog and put all new stuff in lvl 0. after comitlog is done, it can do voodoo shuffles.

        but why not to rename tables like table-h-333-l1-Data.db?

        idea to have stables with non overlapping key ranges is interesting, but read performance is kinda slow (about 50% of normal) here. Its cassandra core modified to get advantage of leveled tables? i.e. search one sstable at level1, one at lvl2 using bloom filters for key?

        Show
        hsn Radim Kolar added a comment - as i understand new flushed tables are placed at level 0. Just replay commitlog and put all new stuff in lvl 0. after comitlog is done, it can do voodoo shuffles. but why not to rename tables like table-h-333-l1-Data.db? idea to have stables with non overlapping key ranges is interesting, but read performance is kinda slow (about 50% of normal) here. Its cassandra core modified to get advantage of leveled tables? i.e. search one sstable at level1, one at lvl2 using bloom filters for key?
        Hide
        jbellis Jonathan Ellis added a comment -

        This isn't really a great place to rehash http://leveldb.googlecode.com/svn/trunk/doc/impl.html and CASSANDRA-1608.

        Show
        jbellis Jonathan Ellis added a comment - This isn't really a great place to rehash http://leveldb.googlecode.com/svn/trunk/doc/impl.html and CASSANDRA-1608 .
        Hide
        brandon.williams Brandon Williams added a comment -

        why not store data in some system CF? would be probably safer choice.

        This has historically been a bad idea, see CASSANDRA-1155, then CASSANDRA-1318 and finally CASSANDRA-1430.

        Show
        brandon.williams Brandon Williams added a comment - why not store data in some system CF? would be probably safer choice. This has historically been a bad idea, see CASSANDRA-1155 , then CASSANDRA-1318 and finally CASSANDRA-1430 .
        Hide
        slebresne Sylvain Lebresne added a comment -

        I don't suppose you were using column family truncation in your tests, where you?

        Show
        slebresne Sylvain Lebresne added a comment - I don't suppose you were using column family truncation in your tests, where you?
        Hide
        hsn Radim Kolar added a comment -

        no truncation, no supercolumns.

        Show
        hsn Radim Kolar added a comment - no truncation, no supercolumns.
        Hide
        slebresne Sylvain Lebresne added a comment -

        Are you still able to reproduce reliably? Because we aren't and being able to would help considerably, so if you are and could share whatever script you're using to reproduce, that would be awesome.

        Show
        slebresne Sylvain Lebresne added a comment - Are you still able to reproduce reliably? Because we aren't and being able to would help considerably, so if you are and could share whatever script you're using to reproduce, that would be awesome.
        Hide
        hsn Radim Kolar added a comment -

        i tested it on 1.0 final and it worked without error for 1 test run. i will give it another test without index.

        Show
        hsn Radim Kolar added a comment - i tested it on 1.0 final and it worked without error for 1 test run. i will give it another test without index.
        Hide
        slebresne Sylvain Lebresne added a comment -

        I'll note that Ramesh Natarajan reported on the mailing list what clearly appears to be the same bug (http://www.mail-archive.com/user@cassandra.apache.org/msg18146.html), but while not using leveled compaction. I also think he was using the 1.0.0 final.

        Show
        slebresne Sylvain Lebresne added a comment - I'll note that Ramesh Natarajan reported on the mailing list what clearly appears to be the same bug ( http://www.mail-archive.com/user@cassandra.apache.org/msg18146.html ), but while not using leveled compaction. I also think he was using the 1.0.0 final.
        Hide
        slebresne Sylvain Lebresne added a comment -

        I'll note that more info have been added to the messages thrown by the exception here in 1.0.1. So if someone can reproduce this issue on 1.0.1, it would be useful to get the stacktrace (the full system.log would actually be even better).

        Show
        slebresne Sylvain Lebresne added a comment - I'll note that more info have been added to the messages thrown by the exception here in 1.0.1. So if someone can reproduce this issue on 1.0.1, it would be useful to get the stacktrace (the full system.log would actually be even better).
        Hide
        jonma MaHaiyang added a comment -

        This AssertionError happened always in cassandra1.0.0 ,not just only in LeveledCompactionStrategy

        Show
        jonma MaHaiyang added a comment - This AssertionError happened always in cassandra1.0.0 ,not just only in LeveledCompactionStrategy
        Hide
        jonma MaHaiyang added a comment -

        I suppose it's a bug in DataTracker .

        Show
        jonma MaHaiyang added a comment - I suppose it's a bug in DataTracker .
        Hide
        slebresne Sylvain Lebresne added a comment -

        As I already said, if you are able to reproduce this, please try reproducing with 1.0.3. And if you are still able to, please attach you system.log with the exception here because it will have more info on the error that should help. And if you're not able to reproduce with 1.0.3, then I guess it means we've fixed it without knowing.

        Show
        slebresne Sylvain Lebresne added a comment - As I already said, if you are able to reproduce this, please try reproducing with 1.0.3. And if you are still able to, please attach you system.log with the exception here because it will have more info on the error that should help. And if you're not able to reproduce with 1.0.3, then I guess it means we've fixed it without knowing.
        Hide
        joelastpass Joe Siegrist added a comment -

        Running under 1.0.4, can easily reproduce this by just kicking of a repair of any LeveledCompactionStrategy CF.

        The 'zero' on the assert indicates the value (added that to the code to see what the value was):

        java.lang.AssertionError: 0
        at org.apache.cassandra.db.compaction.LeveledManifest.promote(LeveledManifest.java:178)
        at org.apache.cassandra.db.compaction.LeveledCompactionStrategy.handleNotification(LeveledCompactionStrategy.java:141)
        at org.apache.cassandra.db.DataTracker.notifySSTablesChanged(DataTracker.java:481)
        at org.apache.cassandra.db.DataTracker.replace(DataTracker.java:275)
        at org.apache.cassandra.db.DataTracker.addSSTables(DataTracker.java:237)
        at org.apache.cassandra.db.DataTracker.addStreamedSSTable(DataTracker.java:242)
        at org.apache.cassandra.db.ColumnFamilyStore.addSSTable(ColumnFamilyStore.java:920)
        at org.apache.cassandra.streaming.StreamInSession.closeIfFinished(StreamInSession.java:141)
        at org.apache.cassandra.streaming.IncomingStreamReader.read(IncomingStreamReader.java:103)
        at org.apache.cassandra.net.IncomingTcpConnection.stream(IncomingTcpConnection.java:184)
        at org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:81)

        Relevant lines from system.log leading up the it:

        INFO [FlushWriter:794] 2011-12-01 14:23:22,966 Memtable.java (line 275) Completed flushing /var/lib/cassandra/data/sso/Sessions-hc-12524-Data.db (1119784 bytes)
        INFO [CompactionExecutor:2379] 2011-12-01 14:23:22,969 CompactionTask.java (line 112) Compacting [SSTableReader(path='/var/lib/cassandra/data/sso/Sessions-hc-12501-Data.db'), SSTableReader(path='/var/lib/cassandra/data/sso/Sessions-hc-12517-Data.db'), SSTableReader(path='/var/lib/cassandra/data/sso/Sessions-hc-12513-Data.db'), SSTableReader(path='/var/lib/cassandra/data/sso/Sessions-hc-12512-Data.db'), SSTableReader(path='/var/lib/cassandra/data/sso/Sessions-hc-12502-Data.db'), SSTableReader(path='/var/lib/cassandra/data/sso/Sessions-hc-12507-Data.db'), SSTableReader(path='/var/lib/cassandra/data/sso/Sessions-hc-12519-Data.db'), SSTableReader(path='/var/lib/cassandra/data/sso/Sessions-hc-12500-Data.db'), SSTableReader(path='/var/lib/cassandra/data/sso/Sessions-hc-12508-Data.db'), SSTableReader(path='/var/lib/cassandra/data/sso/Sessions-hc-12504-Data.db'), SSTableReader(path='/var/lib/cassandra/data/sso/Sessions-hc-12510-Data.db'), SSTableReader(path='/var/lib/cassandra/data/sso/Sessions-hc-12515-Data.db'), SSTableReader(path='/var/lib/cassandra/data/sso/Sessions-hc-12509-Data.db'), SSTableReader(path='/var/lib/cassandra/data/sso/Sessions-hc-12524-Data.db'), SSTableReader(path='/var/lib/cassandra/data/sso/Sessions-hc-12514-Data.db'), SSTableReader(path='/var/lib/cassandra/data/sso/Sessions-hc-12518-Data.db'), SSTableReader(path='/var/lib/cassandra/data/sso/Sessions-hc-12505-Data.db'), SSTableReader(path='/var/lib/cassandra/data/sso/Sessions-hc-12516-Data.db'), SSTableReader(path='/var/lib/cassandra/data/sso/Sessions-hc-12511-Data.db'), SSTableReader(path='/var/lib/cassandra/data/sso/Sessions-hc-12506-Data.db')]
        INFO [AntiEntropyStage:1] 2011-12-01 14:25:06,321 AntiEntropyService.java (line 186) repair #ea080b70-1c51-11e1-0000-692e0c239dfd Received merkle tree for Sessions from /xxxxxxxx
        ERROR [Thread-177] 2011-12-01 14:25:17,863 AbstractCassandraDaemon.java (line 133) Fatal exception in thread Thread[Thread-177,5,main]
        java.lang.AssertionError: 0 [see above]

        If you want more let me know I can reproduce instantly.

        Show
        joelastpass Joe Siegrist added a comment - Running under 1.0.4, can easily reproduce this by just kicking of a repair of any LeveledCompactionStrategy CF. The 'zero' on the assert indicates the value (added that to the code to see what the value was): java.lang.AssertionError: 0 at org.apache.cassandra.db.compaction.LeveledManifest.promote(LeveledManifest.java:178) at org.apache.cassandra.db.compaction.LeveledCompactionStrategy.handleNotification(LeveledCompactionStrategy.java:141) at org.apache.cassandra.db.DataTracker.notifySSTablesChanged(DataTracker.java:481) at org.apache.cassandra.db.DataTracker.replace(DataTracker.java:275) at org.apache.cassandra.db.DataTracker.addSSTables(DataTracker.java:237) at org.apache.cassandra.db.DataTracker.addStreamedSSTable(DataTracker.java:242) at org.apache.cassandra.db.ColumnFamilyStore.addSSTable(ColumnFamilyStore.java:920) at org.apache.cassandra.streaming.StreamInSession.closeIfFinished(StreamInSession.java:141) at org.apache.cassandra.streaming.IncomingStreamReader.read(IncomingStreamReader.java:103) at org.apache.cassandra.net.IncomingTcpConnection.stream(IncomingTcpConnection.java:184) at org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:81) Relevant lines from system.log leading up the it: INFO [FlushWriter:794] 2011-12-01 14:23:22,966 Memtable.java (line 275) Completed flushing /var/lib/cassandra/data/sso/Sessions-hc-12524-Data.db (1119784 bytes) INFO [CompactionExecutor:2379] 2011-12-01 14:23:22,969 CompactionTask.java (line 112) Compacting [SSTableReader(path='/var/lib/cassandra/data/sso/Sessions-hc-12501-Data.db'), SSTableReader(path='/var/lib/cassandra/data/sso/Sessions-hc-12517-Data.db'), SSTableReader(path='/var/lib/cassandra/data/sso/Sessions-hc-12513-Data.db'), SSTableReader(path='/var/lib/cassandra/data/sso/Sessions-hc-12512-Data.db'), SSTableReader(path='/var/lib/cassandra/data/sso/Sessions-hc-12502-Data.db'), SSTableReader(path='/var/lib/cassandra/data/sso/Sessions-hc-12507-Data.db'), SSTableReader(path='/var/lib/cassandra/data/sso/Sessions-hc-12519-Data.db'), SSTableReader(path='/var/lib/cassandra/data/sso/Sessions-hc-12500-Data.db'), SSTableReader(path='/var/lib/cassandra/data/sso/Sessions-hc-12508-Data.db'), SSTableReader(path='/var/lib/cassandra/data/sso/Sessions-hc-12504-Data.db'), SSTableReader(path='/var/lib/cassandra/data/sso/Sessions-hc-12510-Data.db'), SSTableReader(path='/var/lib/cassandra/data/sso/Sessions-hc-12515-Data.db'), SSTableReader(path='/var/lib/cassandra/data/sso/Sessions-hc-12509-Data.db'), SSTableReader(path='/var/lib/cassandra/data/sso/Sessions-hc-12524-Data.db'), SSTableReader(path='/var/lib/cassandra/data/sso/Sessions-hc-12514-Data.db'), SSTableReader(path='/var/lib/cassandra/data/sso/Sessions-hc-12518-Data.db'), SSTableReader(path='/var/lib/cassandra/data/sso/Sessions-hc-12505-Data.db'), SSTableReader(path='/var/lib/cassandra/data/sso/Sessions-hc-12516-Data.db'), SSTableReader(path='/var/lib/cassandra/data/sso/Sessions-hc-12511-Data.db'), SSTableReader(path='/var/lib/cassandra/data/sso/Sessions-hc-12506-Data.db')] INFO [AntiEntropyStage:1] 2011-12-01 14:25:06,321 AntiEntropyService.java (line 186) repair #ea080b70-1c51-11e1-0000-692e0c239dfd Received merkle tree for Sessions from /xxxxxxxx ERROR [Thread-177] 2011-12-01 14:25:17,863 AbstractCassandraDaemon.java (line 133) Fatal exception in thread Thread [Thread-177,5,main] java.lang.AssertionError: 0 [see above] If you want more let me know I can reproduce instantly.
        Hide
        jbellis Jonathan Ellis added a comment -

        Joe, your assertion is the one in CASSANDRA-3536 (where I've attached a patch fixing it). Closing this other one as cantrepro.

        Show
        jbellis Jonathan Ellis added a comment - Joe, your assertion is the one in CASSANDRA-3536 (where I've attached a patch fixing it). Closing this other one as cantrepro.
        Hide
        yukim Yuki Morishita added a comment -

        This error actually happens on 1.1. And I can easily reproduce with unit test(Test code attached).

            [junit] ERROR 17:34:46,696 Fatal exception in thread Thread[CompactionExecutor:3,1,main]
            [junit] java.lang.AssertionError: Expecting new size of 2, got 1 while replacing [SSTableReader(path='build/test/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-hf-1-Data.db'), SSTableReader(path='build/test/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-hf-5-Data.db'), SSTableReader(path='build/test/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-hf-4-Data.db'), SSTableReader(path='build/test/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-hf-2-Data.db')] by [SSTableReader(path='build/test/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-hf-6-Data.db')] in View(pending_count=0, sstables=[SSTableReader(path='build/test/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-hf-1-Data.db'), SSTableReader(path='build/test/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-hf-2-Data.db'), SSTableReader(path='build/test/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-hf-4-Data.db'), SSTableReader(path='build/test/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-hf-4-Data.db'), SSTableReader(path='build/test/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-hf-5-Data.db')], compacting=[SSTableReader(path='build/test/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-hf-1-Data.db'), SSTableReader(path='build/test/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-hf-5-Data.db'), SSTableReader(path='build/test/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-hf-4-Data.db'), SSTableReader(path='build/test/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-hf-2-Data.db')])
            [junit] 	at org.apache.cassandra.db.DataTracker$View.newSSTables(DataTracker.java:651)
            [junit] 	at org.apache.cassandra.db.DataTracker$View.replace(DataTracker.java:616)
            [junit] 	at org.apache.cassandra.db.DataTracker.replace(DataTracker.java:320)
            [junit] 	at org.apache.cassandra.db.DataTracker.replaceCompactedSSTables(DataTracker.java:253)
            [junit] 	at org.apache.cassandra.db.ColumnFamilyStore.replaceCompactedSSTables(ColumnFamilyStore.java:994)
            [junit] 	at org.apache.cassandra.db.compaction.CompactionTask.execute(CompactionTask.java:200)
            [junit] 	at org.apache.cassandra.db.compaction.CompactionManager$1.runMayThrow(CompactionManager.java:154)
            [junit] 	at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
            [junit] 	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
            [junit] 	at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
            [junit] 	at java.util.concurrent.FutureTask.run(FutureTask.java:138)
            [junit] 	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
            [junit] 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
            [junit] 	at java.lang.Thread.run(Thread.java:680)
        

        The cause is actually in streaming. StreamInSession can add duplicate reference to SSTable to DataTracker when it is left even after stream session finishes. This typically happens when source node is marked as dead by FailureDetector during streaming session(GC storm is the one I saw) and keep sending file in same session after the node comes back.

        Show
        yukim Yuki Morishita added a comment - This error actually happens on 1.1. And I can easily reproduce with unit test(Test code attached). [junit] ERROR 17:34:46,696 Fatal exception in thread Thread [CompactionExecutor:3,1,main] [junit] java.lang.AssertionError: Expecting new size of 2, got 1 while replacing [SSTableReader(path='build/test/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-hf-1-Data.db'), SSTableReader(path='build/test/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-hf-5-Data.db'), SSTableReader(path='build/test/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-hf-4-Data.db'), SSTableReader(path='build/test/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-hf-2-Data.db')] by [SSTableReader(path='build/test/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-hf-6-Data.db')] in View(pending_count=0, sstables=[SSTableReader(path='build/test/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-hf-1-Data.db'), SSTableReader(path='build/test/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-hf-2-Data.db'), SSTableReader(path='build/test/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-hf-4-Data.db'), SSTableReader(path='build/test/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-hf-4-Data.db'), SSTableReader(path='build/test/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-hf-5-Data.db')], compacting=[SSTableReader(path='build/test/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-hf-1-Data.db'), SSTableReader(path='build/test/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-hf-5-Data.db'), SSTableReader(path='build/test/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-hf-4-Data.db'), SSTableReader(path='build/test/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-hf-2-Data.db')]) [junit] at org.apache.cassandra.db.DataTracker$View.newSSTables(DataTracker.java:651) [junit] at org.apache.cassandra.db.DataTracker$View.replace(DataTracker.java:616) [junit] at org.apache.cassandra.db.DataTracker.replace(DataTracker.java:320) [junit] at org.apache.cassandra.db.DataTracker.replaceCompactedSSTables(DataTracker.java:253) [junit] at org.apache.cassandra.db.ColumnFamilyStore.replaceCompactedSSTables(ColumnFamilyStore.java:994) [junit] at org.apache.cassandra.db.compaction.CompactionTask.execute(CompactionTask.java:200) [junit] at org.apache.cassandra.db.compaction.CompactionManager$1.runMayThrow(CompactionManager.java:154) [junit] at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30) [junit] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) [junit] at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) [junit] at java.util.concurrent.FutureTask.run(FutureTask.java:138) [junit] at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) [junit] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) [junit] at java.lang. Thread .run( Thread .java:680) The cause is actually in streaming. StreamInSession can add duplicate reference to SSTable to DataTracker when it is left even after stream session finishes. This typically happens when source node is marked as dead by FailureDetector during streaming session(GC storm is the one I saw) and keep sending file in same session after the node comes back.
        Hide
        yukim Yuki Morishita added a comment -

        Test code attached. Compaction strategy is not related.

        Show
        yukim Yuki Morishita added a comment - Test code attached. Compaction strategy is not related.
        Hide
        slebresne Sylvain Lebresne added a comment -

        Good analysis Yuki. I'm not really sure what is the right fix though. Given that this should very rarely happen (repair uses a much higher failure detection threshold than the normal one, though maybe we can increase it even more to make this even less likely) and that I don't seen any obvious way to avoid that kind of situation, maybe making DataTracker handle duplicate addition of a SSTableReader is the simplest thing to do. The obvious way to do that would be to change the View sstables List to a Set, which leads me to the current commentary in the code:

                // We can't use a SortedSet here because "the ordering maintained by a sorted set (whether or not an
                // explicit comparator is provided) must be <i>consistent with equals</i>."  In particular,
                // ImmutableSortedSet will ignore any objects that compare equally with an existing Set member.
                // Obviously, dropping sstables whose max column timestamp happens to be equal to another's
                // is not acceptable for us.  So, we use a List instead.
        

        I think that comment is obsolete. Namely, it was added with CASSANDRA-2498 and at the time the list of sstable was kept in max timestamp order at all time. But since then, we've moved the sorting in max timestamp in CollationController directly (which is less fragile), so the order inside DataTracker doesn't matter anymore.

        Show
        slebresne Sylvain Lebresne added a comment - Good analysis Yuki. I'm not really sure what is the right fix though. Given that this should very rarely happen (repair uses a much higher failure detection threshold than the normal one, though maybe we can increase it even more to make this even less likely) and that I don't seen any obvious way to avoid that kind of situation, maybe making DataTracker handle duplicate addition of a SSTableReader is the simplest thing to do. The obvious way to do that would be to change the View sstables List to a Set, which leads me to the current commentary in the code: // We can't use a SortedSet here because "the ordering maintained by a sorted set (whether or not an // explicit comparator is provided) must be <i>consistent with equals</i>." In particular, // ImmutableSortedSet will ignore any objects that compare equally with an existing Set member. // Obviously, dropping sstables whose max column timestamp happens to be equal to another's // is not acceptable for us. So, we use a List instead. I think that comment is obsolete. Namely, it was added with CASSANDRA-2498 and at the time the list of sstable was kept in max timestamp order at all time. But since then, we've moved the sorting in max timestamp in CollationController directly (which is less fragile), so the order inside DataTracker doesn't matter anymore.
        Hide
        jbellis Jonathan Ellis added a comment - - edited

        This typically happens when source node is marked as dead by FailureDetector during streaming session(GC storm is the one I saw) and keep sending file in same session after the node comes back

        But we close the session on convict, so shouldn't it start a new one?

        Show
        jbellis Jonathan Ellis added a comment - - edited This typically happens when source node is marked as dead by FailureDetector during streaming session(GC storm is the one I saw) and keep sending file in same session after the node comes back But we close the session on convict, so shouldn't it start a new one?
        Hide
        yukim Yuki Morishita added a comment -

        But we close the session on convict, so shouldn't it start a new one?

        Yes, StreamInSession gets closed and removed on convict once. But if GC pause happens in the middle of streaming session, the node resumes streaming in the same session after GC. Since resumed stream carries session ID that is once closed on receiver side, StreamInSession is created again with the same old session ID and this time just 1 file to receive.
        This continues again and again until source node's StreamingOutSession sends all files.
        You can see this in receiver's log file like below:

        INFO [Thread-50] 2012-10-20 13:13:26,574 StreamInSession.java (line 214) Finished streaming session 10 from /10.xx.xx.xx
        INFO [Thread-51] 2012-10-20 13:13:29,691 StreamInSession.java (line 214) Finished streaming session 10 from /10.xx.xx.xx
        INFO [Thread-52] 2012-10-20 13:13:32,957 StreamInSession.java (line 214) Finished streaming session 10 from /10.xx.xx.xx
        

        Duplication happens during this partially broken streaming session. Because StreamInSession is removed after sending SESSION_FINISHED reply, and StreamOutSession keeps sending files, sometimes the same StreamInSession instance receives more than 1 file and calls closeIfFinished every time it received the file.
        (Sorry, this is hard to explain in words.
        https://github.com/apache/cassandra/blob/cassandra-1.1.6/src/java/org/apache/cassandra/streaming/StreamInSession.java#L181 this part is executed multiple times with readers growing by received new file.)

        So as Sylvain stated above, changing DataTracker.View's sstable to Set is one way to eliminate duplicate reference and we should do it. In addition, I'm thinking not to create duplicate StreamInSession by checking StreamHeader.pendingFiles because this field is only filled when initiating streaming.

        Show
        yukim Yuki Morishita added a comment - But we close the session on convict, so shouldn't it start a new one? Yes, StreamInSession gets closed and removed on convict once . But if GC pause happens in the middle of streaming session, the node resumes streaming in the same session after GC. Since resumed stream carries session ID that is once closed on receiver side, StreamInSession is created again with the same old session ID and this time just 1 file to receive. This continues again and again until source node's StreamingOutSession sends all files. You can see this in receiver's log file like below: INFO [ Thread -50] 2012-10-20 13:13:26,574 StreamInSession.java (line 214) Finished streaming session 10 from /10.xx.xx.xx INFO [ Thread -51] 2012-10-20 13:13:29,691 StreamInSession.java (line 214) Finished streaming session 10 from /10.xx.xx.xx INFO [ Thread -52] 2012-10-20 13:13:32,957 StreamInSession.java (line 214) Finished streaming session 10 from /10.xx.xx.xx Duplication happens during this partially broken streaming session. Because StreamInSession is removed after sending SESSION_FINISHED reply, and StreamOutSession keeps sending files, sometimes the same StreamInSession instance receives more than 1 file and calls closeIfFinished every time it received the file. (Sorry, this is hard to explain in words. https://github.com/apache/cassandra/blob/cassandra-1.1.6/src/java/org/apache/cassandra/streaming/StreamInSession.java#L181 this part is executed multiple times with readers growing by received new file.) So as Sylvain stated above, changing DataTracker.View's sstable to Set is one way to eliminate duplicate reference and we should do it. In addition, I'm thinking not to create duplicate StreamInSession by checking StreamHeader.pendingFiles because this field is only filled when initiating streaming.
        Hide
        slebresne Sylvain Lebresne added a comment -

        That code is a mess so let me give a shot at describing what happens for the record. Say node1 wants to stream files A, B and C to node2. If everything goes well what happens is:

        1. node1 sends the first file A with a StreamHeader that says that A, B and C are pending files and A is the currently sent file. On node2, a new StreamInSession is created with those information.
        2. Once A is finished, node2 remove A from the pending file in the StreamInSession send an acknowledgement to node1, and then node1 sends B with a StreamHeader with no pending files (basically the list of pending files is only sent the first time so that the StreamInSession on node2 knows when everything is finished) and B as current file. When node2 received that StreamHeader, it retrieve the StreamInSession, setting B as the current files.
        3. Once B is finished, node2 removes it from pending files, acks to node1 and node1 sends C with a StreamHeader with no pending file and C as current file. Node2 retrieven the StreamInSession and modify it accordingly.
        4. At last, once C is finished, node2 removes it from the pending files. Then it realizes the pending files are empty and so that the streaming is finished and at that point it adds all the SSTableReader created so far to the cfs (and acks to node1 the end of the streaming).

        Now, the problem is if say node1 is marked dead by mistake by node2 during say the streaming of A. I that happens, the only thing we do on node2 is to close the session and remove the streamInSession from the global sessions map. However we don't shutdown the stream or anything, so if node1 is in fact still alive, what will happen is:

        1. A will finish his transfer correctly. Once that's done, node2 will still send an acknowledgement (probably the first mistake, we could check that the session has been closed and send an error instead).
        2. Node1 getting it's acknowledgement will send B with a StreamHeader that has B as current file and no pending files as usual. On reception, node2 will not find any StreamInSession (it has been removed during the close), and so it will create a new one as if that was the beginning of a transfer. And that session will have no pending file (second mistake: if we have to create a new StreamInSession but there is no pending file at all something wrong has happened).
        3. Once B is fully streamed, node2 will acknowledge it to node1 and remove it from it's streamInSession. But that session the new one we just created with no pending file. So the streamInSession will consider the streaming is finished, and it will thus add the SSTableReader for B to the cfs.
        4. Because B has been acknowledged, node1 will start sending C (again, with no pending file in the StreamHeader). This will be done as soon as B was finished, and so concurrently with the streamInSession on node2 closing itself.
        5. So when node2 receives the StreamHeader with C, it will try to retrieve the session and will find the previous session. And will happily add C as the current file for that session (third and fourth mistake: StreamInSession should not add a file as current unless it is a pending file for this session, and a session could detect that it's being reused even though it has just detected itself as finished).
        6. Now when C transfer finishes, the seesion will be notify and since it still has no pending files, it will once again consider the streaming as complete. But since it's still the same session, it still has the SSTableReader for B in its list of created reader (as well as the one for C now). And that's when it adds B for a second time to the DataTracker.

        I also not that we end up without having ever add the SSTableReader for A to the cfs since the very first StreamInSession was never finished. This is not a big deal in that the stream itself has been indicated as failed to the client anyway, but just to say that it's not just a problem of duplicating a SSTableReader preference.

        Anyway, let me back on what I said earlier. We should definitively fix some if not all of the "mistake" above (and send a SESSION_FAILURE to node1 as soon as we detect something is wrong).

        But that being said, my comment on the comment in DataTracker being obsolete still stand, and replacing the list by a set in there would have at least the advantage of slightly simplifying the code of DataTracker.View.newSSTables(), as well as being more resilient if a SSTableReader is added twice. Not a big deal though.

        Show
        slebresne Sylvain Lebresne added a comment - That code is a mess so let me give a shot at describing what happens for the record. Say node1 wants to stream files A, B and C to node2. If everything goes well what happens is: node1 sends the first file A with a StreamHeader that says that A, B and C are pending files and A is the currently sent file. On node2, a new StreamInSession is created with those information. Once A is finished, node2 remove A from the pending file in the StreamInSession send an acknowledgement to node1, and then node1 sends B with a StreamHeader with no pending files (basically the list of pending files is only sent the first time so that the StreamInSession on node2 knows when everything is finished) and B as current file. When node2 received that StreamHeader, it retrieve the StreamInSession, setting B as the current files. Once B is finished, node2 removes it from pending files, acks to node1 and node1 sends C with a StreamHeader with no pending file and C as current file. Node2 retrieven the StreamInSession and modify it accordingly. At last, once C is finished, node2 removes it from the pending files. Then it realizes the pending files are empty and so that the streaming is finished and at that point it adds all the SSTableReader created so far to the cfs (and acks to node1 the end of the streaming). Now, the problem is if say node1 is marked dead by mistake by node2 during say the streaming of A. I that happens, the only thing we do on node2 is to close the session and remove the streamInSession from the global sessions map. However we don't shutdown the stream or anything, so if node1 is in fact still alive, what will happen is: A will finish his transfer correctly. Once that's done, node2 will still send an acknowledgement (probably the first mistake, we could check that the session has been closed and send an error instead). Node1 getting it's acknowledgement will send B with a StreamHeader that has B as current file and no pending files as usual. On reception, node2 will not find any StreamInSession (it has been removed during the close), and so it will create a new one as if that was the beginning of a transfer. And that session will have no pending file (second mistake: if we have to create a new StreamInSession but there is no pending file at all something wrong has happened). Once B is fully streamed, node2 will acknowledge it to node1 and remove it from it's streamInSession. But that session the new one we just created with no pending file. So the streamInSession will consider the streaming is finished, and it will thus add the SSTableReader for B to the cfs. Because B has been acknowledged, node1 will start sending C (again, with no pending file in the StreamHeader). This will be done as soon as B was finished, and so concurrently with the streamInSession on node2 closing itself. So when node2 receives the StreamHeader with C, it will try to retrieve the session and will find the previous session. And will happily add C as the current file for that session (third and fourth mistake: StreamInSession should not add a file as current unless it is a pending file for this session, and a session could detect that it's being reused even though it has just detected itself as finished). Now when C transfer finishes, the seesion will be notify and since it still has no pending files, it will once again consider the streaming as complete. But since it's still the same session, it still has the SSTableReader for B in its list of created reader (as well as the one for C now). And that's when it adds B for a second time to the DataTracker. I also not that we end up without having ever add the SSTableReader for A to the cfs since the very first StreamInSession was never finished. This is not a big deal in that the stream itself has been indicated as failed to the client anyway, but just to say that it's not just a problem of duplicating a SSTableReader preference. Anyway, let me back on what I said earlier. We should definitively fix some if not all of the "mistake" above (and send a SESSION_FAILURE to node1 as soon as we detect something is wrong). But that being said, my comment on the comment in DataTracker being obsolete still stand, and replacing the list by a set in there would have at least the advantage of slightly simplifying the code of DataTracker.View.newSSTables(), as well as being more resilient if a SSTableReader is added twice. Not a big deal though.
        Hide
        yukim Yuki Morishita added a comment -

        Attaching first attempt.
        I changed DataTracker.View's sstables to Set, and made stream fail when file arrives after StreamInSession failed.

        Changing List to Set for sstables sometimes makes CollationControllerTest fail. It was introduced in CASSANDRA-4116, and I think the test and CollationController#collectAllData expect sstables to be ordered by timestamp. I'm not sure if the test is obsolete or we really need sstables to be sorted all the time.
        0002 patch alone will fix the issue, so we can apply that for now.

        Show
        yukim Yuki Morishita added a comment - Attaching first attempt. I changed DataTracker.View's sstables to Set, and made stream fail when file arrives after StreamInSession failed. Changing List to Set for sstables sometimes makes CollationControllerTest fail. It was introduced in CASSANDRA-4116 , and I think the test and CollationController#collectAllData expect sstables to be ordered by timestamp. I'm not sure if the test is obsolete or we really need sstables to be sorted all the time. 0002 patch alone will fix the issue, so we can apply that for now.
        Hide
        slebresne Sylvain Lebresne added a comment -

        For patch 0002, we shouldn't check the FailureDetector otherwise we don't really fix the issue. The only way we know this bug can happen is wher the FailureDetector had marked a node down while it shouldn't have (besides, we just got something from a node so it's fair to assume it is alive).

        and I think the test and CollationController#collectAllData expect sstables to be ordered by timestamp

        It doesn't seem to me that collectAllData needs sstable ordered. In fact, I think that it does a second pass over the sstables iterators just because it doesn't assume sstables are ordered by max timestamp. Moreover, I'm pretty sure it would be a bug to assume that. If you look at DataTracker.View.newSSTables, it ends by Iterables.addAll(newSSTables, replacements) which clearly won't maintain any specific ordering of sstables.

        I'm not sure if the test is obsolete.

        I don't think the test is obsolete but I think we have a minor bug in CollationController. The test want to test that we correctly exclude sstable whose maxTimestamp is less than the most recent row tombstone we have. But that test checks controller.getSstablesIterated(), and for collectAllData, it will count every sstable it include in the first iteration of collectAllData but don't remove those that are remove by the second pass. In other words, I think the correct fix is to decrement stablesIterated in CollationController when in the second pass we remove a sstable (or more simply to set it to iterators.size() just before we collate everything).

        Show
        slebresne Sylvain Lebresne added a comment - For patch 0002, we shouldn't check the FailureDetector otherwise we don't really fix the issue. The only way we know this bug can happen is wher the FailureDetector had marked a node down while it shouldn't have (besides, we just got something from a node so it's fair to assume it is alive). and I think the test and CollationController#collectAllData expect sstables to be ordered by timestamp It doesn't seem to me that collectAllData needs sstable ordered. In fact, I think that it does a second pass over the sstables iterators just because it doesn't assume sstables are ordered by max timestamp. Moreover, I'm pretty sure it would be a bug to assume that. If you look at DataTracker.View.newSSTables, it ends by Iterables.addAll(newSSTables, replacements) which clearly won't maintain any specific ordering of sstables. I'm not sure if the test is obsolete. I don't think the test is obsolete but I think we have a minor bug in CollationController. The test want to test that we correctly exclude sstable whose maxTimestamp is less than the most recent row tombstone we have. But that test checks controller.getSstablesIterated(), and for collectAllData, it will count every sstable it include in the first iteration of collectAllData but don't remove those that are remove by the second pass. In other words, I think the correct fix is to decrement stablesIterated in CollationController when in the second pass we remove a sstable (or more simply to set it to iterators.size() just before we collate everything).
        Hide
        yukim Yuki Morishita added a comment -

        we shouldn't check the FailureDetector otherwise we don't really fix the issue.

        Ok. I've fixed this and reattached 0002.

        The test want to test that we correctly exclude sstable whose maxTimestamp is less than the most recent row tombstone we have.

        Right. But I think the test assumes that SSTables are added to List in order of flush, and that's true as long as we use List. So what I suggest is to remove that part from the test since we no longer use List.
        And sstablesIterated counter in collectAllData is doing fine because we actually read the data from sstable when we go over

        IColumnIterator iter = filter.getSSTableColumnIterator(sstable);
        

        before incrementing counter.

        So I removed that test from CollationControllerTest in 0001-change-DataTracker.View-s-sstables-from-List-to-Set.patch.

        Show
        yukim Yuki Morishita added a comment - we shouldn't check the FailureDetector otherwise we don't really fix the issue. Ok. I've fixed this and reattached 0002. The test want to test that we correctly exclude sstable whose maxTimestamp is less than the most recent row tombstone we have. Right. But I think the test assumes that SSTables are added to List in order of flush, and that's true as long as we use List. So what I suggest is to remove that part from the test since we no longer use List. And sstablesIterated counter in collectAllData is doing fine because we actually read the data from sstable when we go over IColumnIterator iter = filter.getSSTableColumnIterator(sstable); before incrementing counter. So I removed that test from CollationControllerTest in 0001-change-DataTracker.View-s-sstables-from-List-to-Set.patch.
        Hide
        slebresne Sylvain Lebresne added a comment -

        Ok. I've fixed this and reattached 0002.

        Alright, +1 on 0002. Let's commit that for now to 1.1/1.2 as this fix this ticket.

        the test assumes that SSTables are added to List in order of flush
        And sstablesIterated counter in collectAllData is doing fine because we actually read the data

        Right. I guess what I meant is that what is tested right now is not really sensible. Relying on the order of flush is only valid for a small, controlled test, but in reality as soon as compaction kicks in, the order of sstable in DataTracker will be meaningless even with a List instead of a Set. Basically the guarantee collectAll gives us today is that it will eliminate sstables whose maxTimestamp < mostRecentTombstone with just having read the sstable row header, not the full data. But that's not what sstablesIterated counts so it's broken.

        That being said, I think we can improve collectAll in the way described in CASSANDRA-4883. If we do so, the test will pass again without relying on any assumption of the order of sstables in DataTracker. So overall I suggest moving all of this to CASSANDRA-4883.

        Show
        slebresne Sylvain Lebresne added a comment - Ok. I've fixed this and reattached 0002. Alright, +1 on 0002. Let's commit that for now to 1.1/1.2 as this fix this ticket. the test assumes that SSTables are added to List in order of flush And sstablesIterated counter in collectAllData is doing fine because we actually read the data Right. I guess what I meant is that what is tested right now is not really sensible. Relying on the order of flush is only valid for a small, controlled test, but in reality as soon as compaction kicks in, the order of sstable in DataTracker will be meaningless even with a List instead of a Set. Basically the guarantee collectAll gives us today is that it will eliminate sstables whose maxTimestamp < mostRecentTombstone with just having read the sstable row header, not the full data. But that's not what sstablesIterated counts so it's broken. That being said, I think we can improve collectAll in the way described in CASSANDRA-4883 . If we do so, the test will pass again without relying on any assumption of the order of sstables in DataTracker. So overall I suggest moving all of this to CASSANDRA-4883 .
        Hide
        yukim Yuki Morishita added a comment -

        Committed 0002 to 1.1 and trunk.

        Show
        yukim Yuki Morishita added a comment - Committed 0002 to 1.1 and trunk.
        Hide
        dashv Christopher Vincelette added a comment -

        I understand that this has been fixed in newer versions of Cassandra.

        But I'm currently seeing this exact issue on a production 1.1.1 node in my cluster.

        What should be my next step?

        Do I simply restart it?

        Run cleanup? Scrub? Repair?

        Sounds like repair would just fail with the same problem.

        Any advice would be appreciated.

        Show
        dashv Christopher Vincelette added a comment - I understand that this has been fixed in newer versions of Cassandra. But I'm currently seeing this exact issue on a production 1.1.1 node in my cluster. What should be my next step? Do I simply restart it? Run cleanup? Scrub? Repair? Sounds like repair would just fail with the same problem. Any advice would be appreciated.
        Hide
        yukim Yuki Morishita added a comment -

        Yes, restarting the node will help.
        No need to clean up/scrub.

        Please use user@cassandra.apache.org mailing list for these type of questions.

        Show
        yukim Yuki Morishita added a comment - Yes, restarting the node will help. No need to clean up/scrub. Please use user@cassandra.apache.org mailing list for these type of questions.
        Hide
        dashv Christopher Vincelette added a comment -

        Thanks for the swift reply and I will use the mailing list in the future.

        Show
        dashv Christopher Vincelette added a comment - Thanks for the swift reply and I will use the mailing list in the future.

          People

          • Assignee:
            yukim Yuki Morishita
            Reporter:
            hsn Radim Kolar
            Reviewer:
            Sylvain Lebresne
          • Votes:
            1 Vote for this issue
            Watchers:
            9 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development