Cassandra
  1. Cassandra
  2. CASSANDRA-4221

Error while deleting a columnfamily that is being compacted.

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Fix Version/s: 1.1.1
    • Component/s: Core
    • Labels:
      None
    • Environment:

      ccm, dtest, cassandra-1.1. The error does not happen in cassandra-1.0.

      Description

      The following dtest command produces an error:

      export CASSANDRA_VERSION=git:cassandra-1.1; nosetests --nocapture --nologcapture concurrent_schema_changes_test.py:TestConcurrentSchemaChanges.load_test

      Here is the error:

      Error occured during compaction
      java.util.concurrent.ExecutionException: java.io.IOError: java.io.FileNotFoundException: /tmp/dtest-6ECMgy/test/node1/data/Keyspace1/Standard1/Keyspace1-Standard1-hc-47-Data.db (No such file or directory)
      	at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:252)
      	at java.util.concurrent.FutureTask.get(FutureTask.java:111)
      	at org.apache.cassandra.db.compaction.CompactionManager.performMaximal(CompactionManager.java:239)
      	at org.apache.cassandra.db.ColumnFamilyStore.forceMajorCompaction(ColumnFamilyStore.java:1580)
      	at org.apache.cassandra.service.StorageService.forceTableCompaction(StorageService.java:1770)
      	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
      	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      	at java.lang.reflect.Method.invoke(Method.java:616)
      	at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:111)
      	at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:45)
      	at com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:226)
      	at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:138)
      	at com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:251)
      	at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:857)
      	at com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:795)
      	at javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1450)
      	at javax.management.remote.rmi.RMIConnectionImpl.access$200(RMIConnectionImpl.java:90)
      	at javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1285)
      	at javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1383)
      	at javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:807)
      	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
      	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      	at java.lang.reflect.Method.invoke(Method.java:616)
      	at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:322)
      	at sun.rmi.transport.Transport$1.run(Transport.java:177)
      	at java.security.AccessController.doPrivileged(Native Method)
      	at sun.rmi.transport.Transport.serviceCall(Transport.java:173)
      	at sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:553)
      	at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:808)
      	at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:667)
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
      	at java.lang.Thread.run(Thread.java:679)
      Caused by: java.io.IOError: java.io.FileNotFoundException: /tmp/dtest-6ECMgy/test/node1/data/Keyspace1/Standard1/Keyspace1-Standard1-hc-47-Data.db (No such file or directory)
      	at org.apache.cassandra.io.sstable.SSTableScanner.<init>(SSTableScanner.java:61)
      	at org.apache.cassandra.io.sstable.SSTableReader.getDirectScanner(SSTableReader.java:839)
      	at org.apache.cassandra.io.sstable.SSTableReader.getDirectScanner(SSTableReader.java:851)
      	at org.apache.cassandra.db.compaction.AbstractCompactionStrategy.getScanners(AbstractCompactionStrategy.java:142)
      	at org.apache.cassandra.db.compaction.AbstractCompactionStrategy.getScanners(AbstractCompactionStrategy.java:148)
      	at org.apache.cassandra.db.compaction.CompactionTask.execute(CompactionTask.java:121)
      	at org.apache.cassandra.db.compaction.CompactionManager$6.runMayThrow(CompactionManager.java:264)
      	at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
      	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
      	at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
      	at java.util.concurrent.FutureTask.run(FutureTask.java:166)
      	... 3 more
      Caused by: java.io.FileNotFoundException: /tmp/dtest-6ECMgy/test/node1/data/Keyspace1/Standard1/Keyspace1-Standard1-hc-47-Data.db (No such file or directory)
      	at java.io.RandomAccessFile.open(Native Method)
      	at java.io.RandomAccessFile.<init>(RandomAccessFile.java:233)
      	at org.apache.cassandra.io.util.RandomAccessReader.<init>(RandomAccessReader.java:67)
      	at org.apache.cassandra.io.util.RandomAccessReader.open(RandomAccessReader.java:102)
      	at org.apache.cassandra.io.util.RandomAccessReader.open(RandomAccessReader.java:87)
      	at org.apache.cassandra.io.sstable.SSTableReader.openDataReader(SSTableReader.java:985)
      	at org.apache.cassandra.io.sstable.SSTableScanner.<init>(SSTableScanner.java:56)
      	... 13 more
      

      For reference, here is the dtest function that causes the failure. The error happens on the line near the bottom that drops the columnfamily:

          def load_test(self):                                                        
              """                                                                     
              apply schema changes while the cluster is under load.                   
              """                                                                     
              debug("load_test()")                                                    
                                                                                      
              cluster = self.cluster                                                  
              cluster.populate(1).start()                                             
              node1 = cluster.nodelist()[0]                                           
              wait(2)                                                                 
              cursor = self.cql_connection(node1).cursor()                            
                                                                                      
              def stress(args=[]):                                                    
                  debug("Stressing")                                                  
                  node1.stress(args)                                                  
                  debug("Done Stressing")                                             
                                                                                      
              def compact():                                                          
                  debug("Compacting...")                                              
                  node1.nodetool('compact')                                           
                  debug("Done Compacting.")                                           
                                                                                      
              # put some data into the cluster                                        
              stress(['--num-keys=1000000'])                                          
                                                                                      
              # now start compacting...                   
              tcompact = Thread(target=compact)                                       
              tcompact.start()                                                        
              wait(1)                                                                 
                                                                                      
              # now the cluster is under a lot of load. Make some schema changes.     
              cursor.execute("USE Keyspace1")                                         
              wait(1)                                                                 
              cursor.execute("DROP COLUMNFAMILY Standard1")                           
                                                                                      
              wait(3)                                                                 
                                                                                      
              cursor.execute("CREATE COLUMNFAMILY Standard1 (KEY text PRIMARY KEY)")  
                                                                                      
              tcompact.join()                                                         
       
      

      Again, the error happens on cassandra-1.1, but not on cassandra-1.0.

      1. CASSANDRA-4221.patch
        5 kB
        Pavel Yaskevich
      2. CASSANDRA-4221-logging.patch
        4 kB
        Pavel Yaskevich
      3. system.log
        69 kB
        Tyler Patterson
      4. CASSANDRA-4221-v2.patch
        6 kB
        Pavel Yaskevich
      5. 4221-v3.txt
        11 kB
        Jonathan Ellis

        Activity

        Hide
        Pavel Yaskevich added a comment -

        This one seems to be caused by the same problem as CASSANDRA-4230.

        Show
        Pavel Yaskevich added a comment - This one seems to be caused by the same problem as CASSANDRA-4230 .
        Hide
        Jonathan Ellis added a comment -

        Maybe, but I'm skeptical – 4230 is complaining about a file existing when it shouldn't, while this one says a file doesn't exist that should

        Show
        Jonathan Ellis added a comment - Maybe, but I'm skeptical – 4230 is complaining about a file existing when it shouldn't, while this one says a file doesn't exist that should
        Hide
        Pavel Yaskevich added a comment -

        Patch adds a try to stop all running compactions on given Keyspace or ColumnFamily before running a drop command. I have tried the test you have in the description and it ran without failures.

        Show
        Pavel Yaskevich added a comment - Patch adds a try to stop all running compactions on given Keyspace or ColumnFamily before running a drop command. I have tried the test you have in the description and it ran without failures.
        Hide
        Jonathan Ellis added a comment -

        That takes us back to the Bad Old Days pre-CASSANDRA-3116, though. We should be able to fix w/o resorting to A Big Lock.

        Show
        Jonathan Ellis added a comment - That takes us back to the Bad Old Days pre- CASSANDRA-3116 , though. We should be able to fix w/o resorting to A Big Lock.
        Hide
        Pavel Yaskevich added a comment -

        For the KS or CF drop this seems necessary to try to wait until all running compactions finish otherwise it would end up in errors like one in the description, also other operations - create, update - are not affected by this.

        Show
        Pavel Yaskevich added a comment - For the KS or CF drop this seems necessary to try to wait until all running compactions finish otherwise it would end up in errors like one in the description, also other operations - create, update - are not affected by this.
        Hide
        Jonathan Ellis added a comment -

        The idea from 3116 was:

        • Drop will only delete sstables not actively being compacted
        • post-compaction, we check if the CF was dropped, and if so we delete the sstables then
        Show
        Jonathan Ellis added a comment - The idea from 3116 was: Drop will only delete sstables not actively being compacted post-compaction, we check if the CF was dropped, and if so we delete the sstables then
        Hide
        Pavel Yaskevich added a comment -

        I don't know which one is better tho because if compaction fails for some reason which that scenario, wouldn't that mean that all SSTables that were left behind are staying until somebody manually deletes them (or restart would drop them)? We would have to add complexity to the schema merge just to handle that case as well as on the local side...

        Show
        Pavel Yaskevich added a comment - I don't know which one is better tho because if compaction fails for some reason which that scenario, wouldn't that mean that all SSTables that were left behind are staying until somebody manually deletes them (or restart would drop them)? We would have to add complexity to the schema merge just to handle that case as well as on the local side...
        Hide
        Pavel Yaskevich added a comment -

        The other way would be to 'mark a CF for delete' and return to the user right way (making CF invisible to users), sending the drop request to the others where they would apply the same thing (try to stop all compactions running, wait until they are done) and drop.

        Show
        Pavel Yaskevich added a comment - The other way would be to 'mark a CF for delete' and return to the user right way (making CF invisible to users), sending the drop request to the others where they would apply the same thing (try to stop all compactions running, wait until they are done) and drop.
        Hide
        Jonathan Ellis added a comment -

        wouldn't that mean that all SSTables that were left behind are staying until somebody manually deletes them (or restart would drop them)?

        We already clean up partially-written sstables after compaction failure, I don't see why we couldn't use similar logic here.

        Show
        Jonathan Ellis added a comment - wouldn't that mean that all SSTables that were left behind are staying until somebody manually deletes them (or restart would drop them)? We already clean up partially-written sstables after compaction failure, I don't see why we couldn't use similar logic here.
        Hide
        Pavel Yaskevich added a comment -

        The problem I see that that is we need to do a snapshot before start dropping or deleting any CF files so it's probably better to make that drop option 'deferred' until running compactions are stopped so we have a persistent view of the files we would have to operate upon.

        Show
        Pavel Yaskevich added a comment - The problem I see that that is we need to do a snapshot before start dropping or deleting any CF files so it's probably better to make that drop option 'deferred' until running compactions are stopped so we have a persistent view of the files we would have to operate upon.
        Hide
        Jonathan Ellis added a comment -

        DataTracker already makes sstable changes atomic, though. At any time you can snapshot with that and get a consistent view.

        Show
        Jonathan Ellis added a comment - DataTracker already makes sstable changes atomic, though. At any time you can snapshot with that and get a consistent view.
        Hide
        Jonathan Ellis added a comment -

        Tyler, can you still reproduce after the recent schema fixes on the 1.1 branch?

        Show
        Jonathan Ellis added a comment - Tyler, can you still reproduce after the recent schema fixes on the 1.1 branch?
        Hide
        Tyler Patterson added a comment -

        Yes, the error just happened again for me. I did a fresh pull on branch branch cassandra-1.1.

        Show
        Tyler Patterson added a comment - Yes, the error just happened again for me. I did a fresh pull on branch branch cassandra-1.1.
        Hide
        Pavel Yaskevich added a comment -

        Interesting, I can't reproduce it myself. Can you please run it with logging patch attached (and enabled DEBUG logging) and attach debug log from your C* node to this task, so I can check that is happening inside of DataTracker in your case?...

        Show
        Pavel Yaskevich added a comment - Interesting, I can't reproduce it myself. Can you please run it with logging patch attached (and enabled DEBUG logging) and attach debug log from your C* node to this task, so I can check that is happening inside of DataTracker in your case?...
        Hide
        Tyler Patterson added a comment -

        This was after applying both patches to the cassandra-1.1 branch, and setting logging to DEBUG.

        Show
        Tyler Patterson added a comment - This was after applying both patches to the cassandra-1.1 branch, and setting logging to DEBUG.
        Hide
        Tyler Patterson added a comment -

        Somehow that server.log did not have the debug info. Looking into it now.

        Show
        Tyler Patterson added a comment - Somehow that server.log did not have the debug info. Looking into it now.
        Hide
        Tyler Patterson added a comment -

        Debug is enabled now; It looks like CCM overwrites the log level. Only the logging patch was applied in this run.

        Show
        Tyler Patterson added a comment - Debug is enabled now; It looks like CCM overwrites the log level. Only the logging patch was applied in this run.
        Hide
        Pavel Yaskevich added a comment -

        I see debug information I added right now, but there is no IOError in that log described in this task...

        Show
        Pavel Yaskevich added a comment - I see debug information I added right now, but there is no IOError in that log described in this task...
        Hide
        Tyler Patterson added a comment - - edited

        So after some experimentation, the problem is only happening when the log level is set to INFO, but it doesn't happen at DEBUG. Gotta love these ones! I modified the logging patch to do logging.info() rather then logging.debug(), and the problem still happens, so at least you can see those debug messages. I hope this is enough to go on.

        Show
        Tyler Patterson added a comment - - edited So after some experimentation, the problem is only happening when the log level is set to INFO, but it doesn't happen at DEBUG. Gotta love these ones! I modified the logging patch to do logging.info() rather then logging.debug(), and the problem still happens, so at least you can see those debug messages. I hope this is enough to go on.
        Hide
        Pavel Yaskevich added a comment -

        Hah, now I know what is causing it - it's not a drop problem, the situation is triggered when you re-create ColumnFamily right after drop (before all SSTables were actually deleted by background task) so it reads up all SSTables in the directory back to system and tries to compact them simultaneously with them being deleted in the background. That is why we warn people to avoid making any modifications to the active CFs otherwise it could lead to the strange situations like this one.

        Show
        Pavel Yaskevich added a comment - Hah, now I know what is causing it - it's not a drop problem, the situation is triggered when you re-create ColumnFamily right after drop (before all SSTables were actually deleted by background task) so it reads up all SSTables in the directory back to system and tries to compact them simultaneously with them being deleted in the background. That is why we warn people to avoid making any modifications to the active CFs otherwise it could lead to the strange situations like this one.
        Hide
        Jonathan Ellis added a comment -

        Would this be fixed by CASSANDRA-3794 then, since old and new CF will have different IDs?

        Show
        Jonathan Ellis added a comment - Would this be fixed by CASSANDRA-3794 then, since old and new CF will have different IDs?
        Hide
        Pavel Yaskevich added a comment -

        Not really because it generates UUID from ksName + cfName to be able make it the same across all machines independent of their state.

        Show
        Pavel Yaskevich added a comment - Not really because it generates UUID from ksName + cfName to be able make it the same across all machines independent of their state.
        Hide
        Jonathan Ellis added a comment -

        Should we just add a call to abort in-progress compactions at drop time (which will help cleanup happen faster) and call that "as close as we're going to get?"

        Show
        Jonathan Ellis added a comment - Should we just add a call to abort in-progress compactions at drop time (which will help cleanup happen faster) and call that "as close as we're going to get?"
        Hide
        Pavel Yaskevich added a comment -

        This is what I did in my patch

        Show
        Pavel Yaskevich added a comment - This is what I did in my patch
        Hide
        Jonathan Ellis added a comment -
        • stopCompactionFor should take CFS parameters instead of String
        • I don't see any reason to not wait indefinitely here; in fact, if we make sure to wait until compaction finishes, the odds are much better that when we tell the client "all done" he won't be able to send a "create" quickly enough to hit the bug
        • Need to call stopCompactionFor on every replica, not just CompactionServer – move this to DefsTable.dropColumnFamily?
        Show
        Jonathan Ellis added a comment - stopCompactionFor should take CFS parameters instead of String I don't see any reason to not wait indefinitely here; in fact, if we make sure to wait until compaction finishes, the odds are much better that when we tell the client "all done" he won't be able to send a "create" quickly enough to hit the bug Need to call stopCompactionFor on every replica, not just CompactionServer – move this to DefsTable.dropColumnFamily?
        Hide
        Pavel Yaskevich added a comment -

        stopCompactionFor should take CFS parameters instead of String

        I don't really follow here, if you want it to have list of CFMetaData instead of String? String is better suited because CompactionInfo.getColumnFamily() returns a String (CF name).

        I don't see any reason to not wait indefinitely here; in fact, if we make sure to wait until compaction finishes, the odds are much better that when we tell the client "all done" he won't be able to send a "create" quickly enough to hit the bug

        We don't really try to wait indefinitely here, just for 30 seconds (worst case), if compactions don't finish until then we just move on with delete. do you want it to wait until all compactions to finish?

        Need to call stopCompactionFor on every replica, not just CompactionServer – move this to DefsTable.dropColumnFamily?

        I agree, I'm going to move that into dropColumnFamily call so it gets called on the replicas too.

        Show
        Pavel Yaskevich added a comment - stopCompactionFor should take CFS parameters instead of String I don't really follow here, if you want it to have list of CFMetaData instead of String? String is better suited because CompactionInfo.getColumnFamily() returns a String (CF name). I don't see any reason to not wait indefinitely here; in fact, if we make sure to wait until compaction finishes, the odds are much better that when we tell the client "all done" he won't be able to send a "create" quickly enough to hit the bug We don't really try to wait indefinitely here, just for 30 seconds (worst case), if compactions don't finish until then we just move on with delete. do you want it to wait until all compactions to finish? Need to call stopCompactionFor on every replica, not just CompactionServer – move this to DefsTable.dropColumnFamily? I agree, I'm going to move that into dropColumnFamily call so it gets called on the replicas too.
        Hide
        Jonathan Ellis added a comment -

        String is better suited because CompactionInfo.getColumnFamily() returns a String

        Feel free to fix that.

        if compactions don't finish until then we just move on with delete

        It throws IOException.

        Remember that we check for ability to abort compaction every row; if we're compacting a wide row, it could easily take over 30s w/ throttling.

        Show
        Jonathan Ellis added a comment - String is better suited because CompactionInfo.getColumnFamily() returns a String Feel free to fix that. if compactions don't finish until then we just move on with delete It throws IOException. Remember that we check for ability to abort compaction every row; if we're compacting a wide row, it could easily take over 30s w/ throttling.
        Hide
        Pavel Yaskevich added a comment -

        It throws IOException. Remember that we check for ability to abort compaction every row; if we're compacting a wide row, it could easily take over 30s w/ throttling.

        Oh yes, sorry. I think we can just remove that exception and move on with drop, or do you want it to until all compactions finish?

        Show
        Pavel Yaskevich added a comment - It throws IOException. Remember that we check for ability to abort compaction every row; if we're compacting a wide row, it could easily take over 30s w/ throttling. Oh yes, sorry. I think we can just remove that exception and move on with drop, or do you want it to until all compactions finish?
        Hide
        Jonathan Ellis added a comment -

        v3 attached. removes CompactionInfo fields that are redundant w/ the introduction of CFM, and removes the wait from the stop method (it doesn't help clean up the sstables involved any faster, so there is no point in slowing down the drop for it).

        Show
        Jonathan Ellis added a comment - v3 attached. removes CompactionInfo fields that are redundant w/ the introduction of CFM, and removes the wait from the stop method (it doesn't help clean up the sstables involved any faster, so there is no point in slowing down the drop for it).
        Hide
        Pavel Yaskevich added a comment -

        Committed with nit in CompactionInfo.getColumnFamily() to return cfName instead of ksName as in v3.

        Show
        Pavel Yaskevich added a comment - Committed with nit in CompactionInfo.getColumnFamily() to return cfName instead of ksName as in v3.

          People

          • Assignee:
            Pavel Yaskevich
            Reporter:
            Tyler Patterson
            Reviewer:
            Jonathan Ellis
          • Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development