Cassandra
  1. Cassandra
  2. CASSANDRA-4219

Problem with creating keyspace after drop

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Fix Version/s: 1.1.1
    • Component/s: None
    • Labels:
      None
    • Environment:

      Debian 6.0.4 x64

      Description

      Hi,

      I'm doing testing and wanted to drop a keyspace (with a column family) to re-add it with a different strategy. So I ran in cqlsh:

      DROP KEYSPACE PlayLog;

      CREATE KEYSPACE PlayLog WITH strategy_class = 'SimpleStrategy'
      AND strategy_options:replication_factor = 2;

      And everything seemed to be fine. I ran some inserts, which also seemed to go fine, but then selecting them gave me:

      cqlsh:PlayLog> select count from playlog;
      TSocket read 0 bytes

      I wasn't sure what was wrong, so I tried dropping and creating again, and now when I try to create I get:

      cqlsh> CREATE KEYSPACE PlayLog WITH strategy_class = 'SimpleStrategy'
      ... AND strategy_options:replication_factor = 2;
      TSocket read 0 bytes

      And the keyspace doesn't get created. In the log it shows:

      ERROR [Thrift:4] 2012-05-03 18:23:05,124 CustomTThreadPoolServer.java (line 204) Error occurred during processing of message.
      java.lang.RuntimeException: java.util.concurrent.ExecutionException: java.lang.AssertionError
      at org.apache.cassandra.utils.FBUtilities.waitOnFuture(FBUtilities.java:372)
      at org.apache.cassandra.service.MigrationManager.announce(MigrationManager.java:191)
      at org.apache.cassandra.service.MigrationManager.announceNewKeyspace(MigrationManager.java:129)
      at org.apache.cassandra.cql.QueryProcessor.processStatement(QueryProcessor.java:701)
      at org.apache.cassandra.cql.QueryProcessor.process(QueryProcessor.java:875)
      at org.apache.cassandra.thrift.CassandraServer.execute_cql_query(CassandraServer.java:1235)
      at org.apache.cassandra.thrift.Cassandra$Processor$execute_cql_query.getResult(Cassandra.java:3458)
      at org.apache.cassandra.thrift.Cassandra$Processor$execute_cql_query.getResult(Cassandra.java:3446)
      at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:32)
      at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:34)
      at org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:186)
      at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source)
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
      at java.lang.Thread.run(Unknown Source)
      Caused by: java.util.concurrent.ExecutionException: java.lang.AssertionError
      at java.util.concurrent.FutureTask$Sync.innerGet(Unknown Source)
      at java.util.concurrent.FutureTask.get(Unknown Source)
      at org.apache.cassandra.utils.FBUtilities.waitOnFuture(FBUtilities.java:368)
      ... 13 more
      Caused by: java.lang.AssertionError
      at org.apache.cassandra.db.DefsTable.updateKeyspace(DefsTable.java:441)
      at org.apache.cassandra.db.DefsTable.mergeKeyspaces(DefsTable.java:339)
      at org.apache.cassandra.db.DefsTable.mergeSchema(DefsTable.java:269)
      at org.apache.cassandra.service.MigrationManager$1.call(MigrationManager.java:214)
      at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
      at java.util.concurrent.FutureTask.run(Unknown Source)
      ... 3 more
      ERROR [MigrationStage:1] 2012-05-03 18:23:05,124 AbstractCassandraDaemon.java (line 134) Exception in thread Thread[MigrationStage:1,5,main]
      java.lang.AssertionError
      at org.apache.cassandra.db.DefsTable.updateKeyspace(DefsTable.java:441)
      at org.apache.cassandra.db.DefsTable.mergeKeyspaces(DefsTable.java:339)
      at org.apache.cassandra.db.DefsTable.mergeSchema(DefsTable.java:269)
      at org.apache.cassandra.service.MigrationManager$1.call(MigrationManager.java:214)
      at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
      at java.util.concurrent.FutureTask.run(Unknown Source)
      at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source)
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
      at java.lang.Thread.run(Unknown Source)

      Any ideas how I can recover from this?

      I am running version 1.1.0 and have tried nodetool repair, cleanup, compact. I can create other keyspaces, but still can't create a keyspace called PlayLog even though it is not listed anywhere.

      Jeff

      1. CASSANDRA-4219.patch
        1 kB
        Pavel Yaskevich
      2. system-91.223.192.26.log.gz
        642 kB
        Jeff Williams
      3. system-startup-debug.log.gz
        104 kB
        Jeff Williams
      4. system-debug.log.gz
        493 kB
        Jeff Williams
      5. system.log.gz
        19 kB
        Jeff Williams
      6. 0001-Add-debug-logs.txt
        2 kB
        Sylvain Lebresne

        Activity

        Hide
        Michael Harris added a comment -

        Your bug-tracking system isn't the right place to report that a bug that was supposed to be fixed might have a regression in a later version...? Ok, I'll mail the list then...

        Show
        Michael Harris added a comment - Your bug-tracking system isn't the right place to report that a bug that was supposed to be fixed might have a regression in a later version...? Ok, I'll mail the list then...
        Hide
        Brandon Williams added a comment -

        Before 1.1.7 all bets are off for schema problems. Jira is not the best support forum, try the mailing list or irc.

        Show
        Brandon Williams added a comment - Before 1.1.7 all bets are off for schema problems. Jira is not the best support forum, try the mailing list or irc.
        Hide
        Michael Harris added a comment -

        I am seeing this issue on version 1.1.6 as well upon dropping and recreating a keyspace. Any idea whether there might have been a regression between the fix of this issue and the release of 1.1.6?

        java.lang.RuntimeException: java.util.concurrent.ExecutionException: java.lang.NullPointerException
        at org.apache.cassandra.utils.FBUtilities.waitOnFuture(FBUtilities.java:373)
        at org.apache.cassandra.service.MigrationManager.announce(MigrationManager.java:194)
        at org.apache.cassandra.service.MigrationManager.announceNewKeyspace(MigrationManager.java:127)
        at org.apache.cassandra.thrift.CassandraServer.system_add_keyspace(CassandraServer.java:992)
        ... (redacted)
        Caused by: java.util.concurrent.ExecutionException: java.lang.NullPointerException
        at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222)
        at java.util.concurrent.FutureTask.get(FutureTask.java:83)
        at org.apache.cassandra.utils.FBUtilities.waitOnFuture(FBUtilities.java:369)
        ... 64 more
        Caused by: java.lang.NullPointerException
        at org.apache.cassandra.db.DefsTable.updateKeyspace(DefsTable.java:518)
        at org.apache.cassandra.db.DefsTable.mergeKeyspaces(DefsTable.java:415)
        at org.apache.cassandra.db.DefsTable.mergeSchema(DefsTable.java:345)
        at org.apache.cassandra.service.MigrationManager$1.call(MigrationManager.java:217)
        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
        at java.util.concurrent.FutureTask.run(FutureTask.java:138)
        ... 3 more

        Show
        Michael Harris added a comment - I am seeing this issue on version 1.1.6 as well upon dropping and recreating a keyspace. Any idea whether there might have been a regression between the fix of this issue and the release of 1.1.6? java.lang.RuntimeException: java.util.concurrent.ExecutionException: java.lang.NullPointerException at org.apache.cassandra.utils.FBUtilities.waitOnFuture(FBUtilities.java:373) at org.apache.cassandra.service.MigrationManager.announce(MigrationManager.java:194) at org.apache.cassandra.service.MigrationManager.announceNewKeyspace(MigrationManager.java:127) at org.apache.cassandra.thrift.CassandraServer.system_add_keyspace(CassandraServer.java:992) ... (redacted) Caused by: java.util.concurrent.ExecutionException: java.lang.NullPointerException at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222) at java.util.concurrent.FutureTask.get(FutureTask.java:83) at org.apache.cassandra.utils.FBUtilities.waitOnFuture(FBUtilities.java:369) ... 64 more Caused by: java.lang.NullPointerException at org.apache.cassandra.db.DefsTable.updateKeyspace(DefsTable.java:518) at org.apache.cassandra.db.DefsTable.mergeKeyspaces(DefsTable.java:415) at org.apache.cassandra.db.DefsTable.mergeSchema(DefsTable.java:345) at org.apache.cassandra.service.MigrationManager$1.call(MigrationManager.java:217) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) ... 3 more
        Hide
        Pavel Yaskevich added a comment -

        Hi Doug, I think that the best option would be to simply apply patch from this issue and re-compile Cassandra.

        Show
        Pavel Yaskevich added a comment - Hi Doug, I think that the best option would be to simply apply patch from this issue and re-compile Cassandra.
        Hide
        Douglas Muth added a comment -

        Good Afternoon,

        It turns out that I am having this exact issue, and I found this bug via Google. As previously stated, I dropped a keyspace during testing, but was unable to re-create it. Looking in system.schema_keyspaces still shows an entry for that keyspace, but I cannot drop that keyspcae either.

        First, is there anything I can do as a workaround, short of deleting all of my data and starting over?

        Second, this isn't in a cluster, it's a single machine, and a development machine at that. So there's no confidential data involved. If it would be of any help whatsoever, I'd be happy to send copies of the entire /var/lib/cassandra/ directory. Please let me know!

        Thanks for your time,

        – Doug

        Show
        Douglas Muth added a comment - Good Afternoon, It turns out that I am having this exact issue, and I found this bug via Google. As previously stated, I dropped a keyspace during testing, but was unable to re-create it. Looking in system.schema_keyspaces still shows an entry for that keyspace, but I cannot drop that keyspcae either. First, is there anything I can do as a workaround, short of deleting all of my data and starting over? Second, this isn't in a cluster, it's a single machine, and a development machine at that. So there's no confidential data involved. If it would be of any help whatsoever, I'd be happy to send copies of the entire /var/lib/cassandra/ directory. Please let me know! Thanks for your time, – Doug
        Hide
        Pavel Yaskevich added a comment -

        Committed.

        Show
        Pavel Yaskevich added a comment - Committed.
        Hide
        Jeff Williams added a comment -

        I've done some thorough testing now and this looks fixed. Thanks guys.

        Show
        Jeff Williams added a comment - I've done some thorough testing now and this looks fixed. Thanks guys.
        Hide
        Jonathan Ellis added a comment -

        +1

        Show
        Jonathan Ellis added a comment - +1
        Hide
        Pavel Yaskevich added a comment -

        Ok, I understand your worries, we use CFS.getRangeSlice to fetch data which calls removeDeleted in the process of request processing.

        Show
        Pavel Yaskevich added a comment - Ok, I understand your worries, we use CFS.getRangeSlice to fetch data which calls removeDeleted in the process of request processing.
        Hide
        Jonathan Ellis added a comment -

        Don't think that we should worry about removeDeleted because isMarkedForDelete + isEmpty gives us sufficient information

        To clarify: what I'm concerned about is, if there is a row-level tombstone against a previously existing CF definition, then row.isEmpty will be false. so (markedForDelete && isEmpty) will also be false...

        Show
        Jonathan Ellis added a comment - Don't think that we should worry about removeDeleted because isMarkedForDelete + isEmpty gives us sufficient information To clarify: what I'm concerned about is, if there is a row-level tombstone against a previously existing CF definition, then row.isEmpty will be false. so (markedForDelete && isEmpty) will also be false...
        Hide
        Pavel Yaskevich added a comment -

        No, we only check for row.cf.isEmpty when row was marked for delete which could be that KS/CF was actually deleted (empty but row is still there) or re-created. Don't think that we should worry about removeDeleted because isMarkedForDelete + isEmpty gives us sufficient information.

        Show
        Pavel Yaskevich added a comment - No, we only check for row.cf.isEmpty when row was marked for delete which could be that KS/CF was actually deleted (empty but row is still there) or re-created. Don't think that we should worry about removeDeleted because isMarkedForDelete + isEmpty gives us sufficient information.
        Hide
        Jonathan Ellis added a comment -

        do we need to test for row.cf.isEmpty when there was no deletion involved?

        do we need a removeDeleted call in there so the row tombstone can supress obsolete columns pre-compaction?

        Show
        Jonathan Ellis added a comment - do we need to test for row.cf.isEmpty when there was no deletion involved? do we need a removeDeleted call in there so the row tombstone can supress obsolete columns pre-compaction?
        Hide
        Jeff Williams added a comment - - edited

        Just did a quick test and it is looking good. The keyspace doesn't disappear! I'll do some more checks to make sure that all of the data inserts works before and after.

        Show
        Jeff Williams added a comment - - edited Just did a quick test and it is looking good. The keyspace doesn't disappear! I'll do some more checks to make sure that all of the data inserts works before and after.
        Hide
        Pavel Yaskevich added a comment -

        Jeff, I have done both scenarios you mentioned to check if everything now working as expected, can you please also confirm (just to double-check) if it works on your side too?...

        Show
        Pavel Yaskevich added a comment - Jeff, I have done both scenarios you mentioned to check if everything now working as expected, can you please also confirm (just to double-check) if it works on your side too?...
        Hide
        Pavel Yaskevich added a comment -

        The problem was that when KS/CF is deleted row in the system table is marked for delete and all of it's columns are moved, so when it's re-created columns are added but row stays marked for delete, we need to check if given KS/CF doesn't have attributes in it's system table and if it's marked for delete all together, when we do schema version re-generate or load.

        Show
        Pavel Yaskevich added a comment - The problem was that when KS/CF is deleted row in the system table is marked for delete and all of it's columns are moved, so when it's re-created columns are added but row stays marked for delete, we need to check if given KS/CF doesn't have attributes in it's system table and if it's marked for delete all together, when we do schema version re-generate or load.
        Hide
        Pavel Yaskevich added a comment -

        Yeah, I can reproduce this one - that happens because when you re-create keyspace it wouldn't change the UUID version from the original so when migration is send to the node2 originally it wouldn't correctly merge it into the system table. I will try to fix this one asap.

        Show
        Pavel Yaskevich added a comment - Yeah, I can reproduce this one - that happens because when you re-create keyspace it wouldn't change the UUID version from the original so when migration is send to the node2 originally it wouldn't correctly merge it into the system table. I will try to fix this one asap.
        Hide
        Jeff Williams added a comment -

        Pavel,

        I am able to recreate this with a freshly installed cluster using the debian 1.1 packages. The steps are:

        1. Setup the cluster (I only had 2 nodes in my test)
        2. create keyspace on node1 (confirm created on node2)
        3. drop keyspace on node1 (confirm dropped on node2)
        4. re-created keyspace on node1 (confirm re-created on node2)
        5. restart node2
        6. keyspace no longer exists on node2

        Regards,
        Jeff

        Show
        Jeff Williams added a comment - Pavel, I am able to recreate this with a freshly installed cluster using the debian 1.1 packages. The steps are: 1. Setup the cluster (I only had 2 nodes in my test) 2. create keyspace on node1 (confirm created on node2) 3. drop keyspace on node1 (confirm dropped on node2) 4. re-created keyspace on node1 (confirm re-created on node2) 5. restart node2 6. keyspace no longer exists on node2 Regards, Jeff
        Hide
        Pavel Yaskevich added a comment -

        Hi Jeff, unfortunately I wasn't able to reproduce the situation you are seeing using the following steps:

        1. run ccm cluster with 3 nodes
        2. create keyspace/cf and added some data
        3. stopped node 2
        4. dropped keyspace (using CLI from node 1)
        5. re-created keyspace and column family (using CLI from node 3 and on the other try from node 1)
        6. added some data to the keyspace
        7. started node 2

        I have seen some of ('Couldn't find cfId=X) but this is unavoidable since we use sequential numbering of CFs and you have re-created one with the same name, we have an issue to switch to UUID ids too (CASSANDRA-3794).

        Can you try to run 'resetlocalschema' nodetool command on the failing node? It would truncate all of the schema system tables and try to request it again and re-apply, this was designed specially to resolve such weird situations.

        Show
        Pavel Yaskevich added a comment - Hi Jeff, unfortunately I wasn't able to reproduce the situation you are seeing using the following steps: 1. run ccm cluster with 3 nodes 2. create keyspace/cf and added some data 3. stopped node 2 4. dropped keyspace (using CLI from node 1) 5. re-created keyspace and column family (using CLI from node 3 and on the other try from node 1) 6. added some data to the keyspace 7. started node 2 I have seen some of ('Couldn't find cfId=X) but this is unavoidable since we use sequential numbering of CFs and you have re-created one with the same name, we have an issue to switch to UUID ids too ( CASSANDRA-3794 ). Can you try to run 'resetlocalschema' nodetool command on the failing node? It would truncate all of the schema system tables and try to request it again and re-apply, this was designed specially to resolve such weird situations.
        Hide
        Jeff Williams added a comment -

        Hi Pavel,

        Have you been able to reproduce this? I am wanting to use these servers for production traffic and am wondering if this issue is a general bug, or due to a corruption in my cluster.

        Regards,
        Jeff

        Show
        Jeff Williams added a comment - Hi Pavel, Have you been able to reproduce this? I am wanting to use these servers for production traffic and am wondering if this issue is a general bug, or due to a corruption in my cluster. Regards, Jeff
        Hide
        Jeff Williams added a comment -

        Ok, I can now reproduce this on my cluster.

        If I start with all three servers running. And on one of the servers I create a keyspace, create a column family and test, it all works fine. If I then drop the keyspace and re-create it, everything continues to work. However, as soon as one of the nodes is restarted, the keyspace disappears on that node. If I restart every node in the cluster, then the keyspace cannot be seen anyhwere, however, I can still no longer create a keyspace with that name.

        Show
        Jeff Williams added a comment - Ok, I can now reproduce this on my cluster. If I start with all three servers running. And on one of the servers I create a keyspace, create a column family and test, it all works fine. If I then drop the keyspace and re-create it, everything continues to work. However, as soon as one of the nodes is restarted, the keyspace disappears on that node. If I restart every node in the cluster, then the keyspace cannot be seen anyhwere, however, I can still no longer create a keyspace with that name.
        Hide
        Jeff Williams added a comment -

        Anything useful in that log?

        I seems to have replicated the issue somehow. Firstly, I moved the servers onto public IP's, though the last octet is the same:

        nodetool -h meta01 ring
        Address DC Rack Status State Load Owns Token
        113427455640312821154458202477256070485
        91.223.192.25 CPH R1 Up Normal 11.2 MB 33.33% 0
        91.223.192.26 CPH R1 Up Normal 15.16 MB 33.33% 56713727820156410577229101238628035242
        91.223.192.24 CPH R1 Up Normal 20.11 MB 33.33% 113427455640312821154458202477256070485

        I created a new keyspace PlayLog2 (PlayLog still does not work), and a column family playlog.

        This was available on all nodes. I then ran a few test inserts which worked fine. Then, to test fail-over, I shutdown the node 91.223.192.26 during inserts. The inserts completed fine and a while later I restarted the node 91.223.192.26. Then, when I went to re-run my tests, I see (Hector client):

        5710 [Thread-1] DEBUG me.prettyprint.cassandra.connection.client.HThriftClient - Creating a new thrift connection to meta02.cph.aspiro.com(91.223.192.26):9160
        5711 [Thread-0] DEBUG me.prettyprint.cassandra.connection.client.HThriftClient - keyspace reseting from null to PlayLog2
        Exception in thread "Thread-1" me.prettyprint.hector.api.exceptions.HInvalidRequestException: InvalidRequestException(why:Keyspace PlayLog2 does not exist)

        Sure enough, from command line client on 91.223.192.26, I see no PlayLog2 keyspace, yet it exists on 91.223.192.24 and 91.223.192.25. I have attached the system.log from 91.223.192.26 in the hope that it is useful.

        Show
        Jeff Williams added a comment - Anything useful in that log? I seems to have replicated the issue somehow. Firstly, I moved the servers onto public IP's, though the last octet is the same: nodetool -h meta01 ring Address DC Rack Status State Load Owns Token 113427455640312821154458202477256070485 91.223.192.25 CPH R1 Up Normal 11.2 MB 33.33% 0 91.223.192.26 CPH R1 Up Normal 15.16 MB 33.33% 56713727820156410577229101238628035242 91.223.192.24 CPH R1 Up Normal 20.11 MB 33.33% 113427455640312821154458202477256070485 I created a new keyspace PlayLog2 (PlayLog still does not work), and a column family playlog. This was available on all nodes. I then ran a few test inserts which worked fine. Then, to test fail-over, I shutdown the node 91.223.192.26 during inserts. The inserts completed fine and a while later I restarted the node 91.223.192.26. Then, when I went to re-run my tests, I see (Hector client): 5710 [Thread-1] DEBUG me.prettyprint.cassandra.connection.client.HThriftClient - Creating a new thrift connection to meta02.cph.aspiro.com(91.223.192.26):9160 5711 [Thread-0] DEBUG me.prettyprint.cassandra.connection.client.HThriftClient - keyspace reseting from null to PlayLog2 Exception in thread "Thread-1" me.prettyprint.hector.api.exceptions.HInvalidRequestException: InvalidRequestException(why:Keyspace PlayLog2 does not exist) Sure enough, from command line client on 91.223.192.26, I see no PlayLog2 keyspace, yet it exists on 91.223.192.24 and 91.223.192.25. I have attached the system.log from 91.223.192.26 in the hope that it is useful.
        Hide
        Jeff Williams added a comment -

        Debug log for node startup

        Show
        Jeff Williams added a comment - Debug log for node startup
        Hide
        Sylvain Lebresne added a comment -

        The weird part here is that this DEBUG log seems to say that the keyspace should exist before you even do the create keyspace (but if it was correctly loaded, you should get a different error, so something is wrong). Could you try restarting the node with DEBUG log and attach that (just wait for the node to be up and running). I want to try to see what's going on with the loading of that keyspace.

        Show
        Sylvain Lebresne added a comment - The weird part here is that this DEBUG log seems to say that the keyspace should exist before you even do the create keyspace (but if it was correctly loaded, you should get a different error, so something is wrong). Could you try restarting the node with DEBUG log and attach that (just wait for the node to be up and running). I want to try to see what's going on with the loading of that keyspace.
        Hide
        Jeff Williams added a comment -

        Log for adding Keyspace with debug patch applied and log mode DEBUG.

        Show
        Jeff Williams added a comment - Log for adding Keyspace with debug patch applied and log mode DEBUG.
        Hide
        Jeff Williams added a comment -

        I have attached the entire system.log from yesterday for the server I was running cqlsh on (10.20.20.25). The cluster is:

        oot@meta01:~# nodetool -h meta01 ring PlayLog3
        Address DC Rack Status State Load Effective-Owership Token
        113427455640312821154458202477256070485
        10.20.20.25 CPH R1 Up Normal 27.88 MB 66.67% 0
        10.20.20.26 CPH R1 Up Normal 17.5 MB 66.67% 56713727820156410577229101238628035242
        10.20.20.24 CPH R1 Up Normal 72.44 MB 66.67% 113427455640312821154458202477256070485

        However, I have switched from SimpleSnitch to PropertyFileSnitch this morning.

        I'm not sure about the times I run the commands and where that corresponds to in the log, but I'm guessing you may know.

        I was testing fail-over and was shutting down the server 10.20.20.26 during writes as a testing. However, it looks like I took it down at 2012-05-03 16:07:13 and that it was down when I did the drop and create, which could be the cause. I see the first error soon after it came back up. Also, I remember that these 'Couldn't find cfId=1013' errors occurred in the system.log on the other servers at the same time (I can send these if you want).

        I'm currently running from the Debian packages. I'll try to apply the patch there and reinstall the packages.

        Show
        Jeff Williams added a comment - I have attached the entire system.log from yesterday for the server I was running cqlsh on (10.20.20.25). The cluster is: oot@meta01:~# nodetool -h meta01 ring PlayLog3 Address DC Rack Status State Load Effective-Owership Token 113427455640312821154458202477256070485 10.20.20.25 CPH R1 Up Normal 27.88 MB 66.67% 0 10.20.20.26 CPH R1 Up Normal 17.5 MB 66.67% 56713727820156410577229101238628035242 10.20.20.24 CPH R1 Up Normal 72.44 MB 66.67% 113427455640312821154458202477256070485 However, I have switched from SimpleSnitch to PropertyFileSnitch this morning. I'm not sure about the times I run the commands and where that corresponds to in the log, but I'm guessing you may know. I was testing fail-over and was shutting down the server 10.20.20.26 during writes as a testing. However, it looks like I took it down at 2012-05-03 16:07:13 and that it was down when I did the drop and create, which could be the cause. I see the first error soon after it came back up. Also, I remember that these 'Couldn't find cfId=1013' errors occurred in the system.log on the other servers at the same time (I can send these if you want). I'm currently running from the Debian packages. I'll try to apply the patch there and reinstall the packages.
        Hide
        Sylvain Lebresne added a comment -

        Would that be possible for you to try applying the attached patch (0001-Add-debug-logs.txt) on one of the machine, switch the log to DEBUG, try recreating said keyspace and send us the resulting log. This should give us more info on what's going on.

        Also, do you still have the log of when you first got the 'TSocket read 0 bytes' during a select? Is there any corresponding errors?

        Show
        Sylvain Lebresne added a comment - Would that be possible for you to try applying the attached patch (0001-Add-debug-logs.txt) on one of the machine, switch the log to DEBUG, try recreating said keyspace and send us the resulting log. This should give us more info on what's going on. Also, do you still have the log of when you first got the 'TSocket read 0 bytes' during a select? Is there any corresponding errors?

          People

          • Assignee:
            Pavel Yaskevich
            Reporter:
            Jeff Williams
            Reviewer:
            Jonathan Ellis
          • Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development