Cassandra
  1. Cassandra
  2. CASSANDRA-4880

Endless loop flushing+compacting system/schema_keyspaces and system/schema_columnfamilies

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Fix Version/s: 1.1.7, 1.2.0 beta 3
    • Component/s: Core
    • Labels:
      None
    • Environment:

      Linux x86_64 3.4.9, sun-jdk 1.6.0_33

      Description

      After upgrading a node from 1.1.2 to 1.1.6, the startup sequence entered a loop as seen here:

      http://mina.naguib.ca/misc/cassandra_116_startup_loop.txt

      Stopping and starting the node entered the same loop.

      Reverting back to 1.1.2 started successfully.

      1. CASSANDRA-4880.patch
        7 kB
        Pavel Yaskevich
      2. CASSANDRA-4880-fix.patch
        3 kB
        Pavel Yaskevich
      3. 131203-schema-1.txt
        1 kB
        Wei-dun Teng
      4. 131203-schema-2.txt
        1 kB
        Wei-dun Teng

        Issue Links

          Activity

          Hide
          Cyril Scetbon added a comment -

          Done. see CASSANDRA-6614

          Show
          Cyril Scetbon added a comment - Done. see CASSANDRA-6614
          Hide
          Brandon Williams added a comment -

          This ticket was closed more than a year ago. Please open a new ticket if you encounter the issue, because it's probably not related to this one.

          Show
          Brandon Williams added a comment - This ticket was closed more than a year ago. Please open a new ticket if you encounter the issue, because it's probably not related to this one.
          Hide
          Cyril Scetbon added a comment -

          When I upgrade one node from 1.2.2 to 1.2.13 for 2h I get the previous messages with a raise of CPU(as it flush and compact continually) on all nodes http://picpaste.com/pics/Screen_Shot_2014-01-24_at_09.18.50-ggcCDVqd.1390551670.png
          After that, everything is fine and I can upgrade other nodes without any important raise of cpus load. when I start the upgrade, the more nodes I upgrade at the same time (at the beginning), the higher the cpu load is http://picpaste.com/pics/Screen_Shot_2014-01-23_at_17.45.56-I3fdEQ2T.1390552036.png

          Show
          Cyril Scetbon added a comment - When I upgrade one node from 1.2.2 to 1.2.13 for 2h I get the previous messages with a raise of CPU(as it flush and compact continually) on all nodes http://picpaste.com/pics/Screen_Shot_2014-01-24_at_09.18.50-ggcCDVqd.1390551670.png After that, everything is fine and I can upgrade other nodes without any important raise of cpus load. when I start the upgrade, the more nodes I upgrade at the same time (at the beginning), the higher the cpu load is http://picpaste.com/pics/Screen_Shot_2014-01-23_at_17.45.56-I3fdEQ2T.1390552036.png
          Hide
          Cyril Scetbon added a comment - - edited

          I can easily reproduce it by having one DC with 1.2.2 and another with 1.2.12/1.2.13. see http://pastebin.com/YZKUQLXz
          If I grep for only InternalResponseStage logs I get http://pastebin.com/htnXZCiT which always displays same account of ops and serialized/live bytes per column family. Column family system/schema_columns is also concerned by this issue.

          Show
          Cyril Scetbon added a comment - - edited I can easily reproduce it by having one DC with 1.2.2 and another with 1.2.12/1.2.13. see http://pastebin.com/YZKUQLXz If I grep for only InternalResponseStage logs I get http://pastebin.com/htnXZCiT which always displays same account of ops and serialized/live bytes per column family. Column family system/schema_columns is also concerned by this issue.
          Hide
          Wei-dun Teng added a comment -

          Run the node with 1.2.2 again, and the schema converged. Further upgrading JVM to openjdk-7.25.15_2 and Cassandra to 1.2.12 didn't reproduce the problem. Strange, but at least my cluster now upgrading fine ...

          Show
          Wei-dun Teng added a comment - Run the node with 1.2.2 again, and the schema converged. Further upgrading JVM to openjdk-7.25.15_2 and Cassandra to 1.2.12 didn't reproduce the problem. Strange, but at least my cluster now upgrading fine ...
          Hide
          Wei-dun Teng added a comment -

          I have encountered similar bug after I upgrade one of my 1.2.2 node to 1.2.12.
          Was using FreeBSD 8.2 + diablo-jre-1.6.0.07.02_18.

          Before I upgrading the node to 1.2.12, I changed JVM to openjdk-7.25.15_2 (in retrospective probably not a good idea ...), saw flipping Memtable flushes, changes JVM back to diablo-jre-1.6.0.07.02_18, and still seeing rapid flushes. Now I've taken the 1.2.12 node offline (but not decommissioned it).

          After that I see tens of Memtable flushes on the 1.2.12 per second, while once or twice a second on nodes with 1.2.2.

          Attached files are the flipping schema versions.

          Show
          Wei-dun Teng added a comment - I have encountered similar bug after I upgrade one of my 1.2.2 node to 1.2.12. Was using FreeBSD 8.2 + diablo-jre-1.6.0.07.02_18. Before I upgrading the node to 1.2.12, I changed JVM to openjdk-7.25.15_2 (in retrospective probably not a good idea ...), saw flipping Memtable flushes, changes JVM back to diablo-jre-1.6.0.07.02_18, and still seeing rapid flushes. Now I've taken the 1.2.12 node offline (but not decommissioned it). After that I see tens of Memtable flushes on the 1.2.12 per second, while once or twice a second on nodes with 1.2.2. Attached files are the flipping schema versions.
          Hide
          Pavel Yaskevich added a comment -

          Committed.

          Show
          Pavel Yaskevich added a comment - Committed.
          Hide
          Brandon Williams added a comment -

          +1

          Show
          Brandon Williams added a comment - +1
          Hide
          Pavel Yaskevich added a comment -

          We don't need special RM deserialization handler any more, verified that tests are now passing.

          Show
          Pavel Yaskevich added a comment - We don't need special RM deserialization handler any more, verified that tests are now passing.
          Hide
          Pavel Yaskevich added a comment -

          Interesting... Ok, I will try to figure it out asap.

          Show
          Pavel Yaskevich added a comment - Interesting... Ok, I will try to figure it out asap.
          Hide
          Brandon Williams added a comment -

          Nope, those tests all run with the same version.

          Show
          Brandon Williams added a comment - Nope, those tests all run with the same version.
          Hide
          Pavel Yaskevich added a comment -

          Could that be that those two stub nodes running on different versions - one on recent and one on previous? That would explain why they have disagreement. It's tested on rolling restart by CASSANDRA-4837.

          Show
          Pavel Yaskevich added a comment - Could that be that those two stub nodes running on different versions - one on recent and one on previous? That would explain why they have disagreement. It's tested on rolling restart by CASSANDRA-4837 .
          Hide
          Brandon Williams added a comment -

          I think something's still wrong in 1.1, concurrent schema changes aren't settling: http://buildbot.datastax.com:8010/builders/cassandra-1.1/builds/567/steps/shell/logs/stdio

          Everything works in 1.2.

          Show
          Brandon Williams added a comment - I think something's still wrong in 1.1, concurrent schema changes aren't settling: http://buildbot.datastax.com:8010/builders/cassandra-1.1/builds/567/steps/shell/logs/stdio Everything works in 1.2.
          Hide
          Pavel Yaskevich added a comment -

          Committed.

          Show
          Pavel Yaskevich added a comment - Committed.
          Hide
          Brandon Williams added a comment -

          Hmm, this unfortunate but I don't see any way around it, either. +1

          Show
          Brandon Williams added a comment - Hmm, this unfortunate but I don't see any way around it, either. +1
          Hide
          Pavel Yaskevich added a comment -

          Assigning Brandon as reviewer because he was initially involved.

          Show
          Pavel Yaskevich added a comment - Assigning Brandon as reviewer because he was initially involved.
          Hide
          Pavel Yaskevich added a comment - - edited

          I made start timestamp schema timestamp repair account and fix any timestamps that are grater than current timestamp and introduced new version in MessagingVersion as it's the only way to fix infinite loop in schema due to broken nano timestamps that we can't properly fix in <= 1.1.6.

          Edit: Also note that once this is committed we would have to increase VERSION number for cassandra-1.2.

          Show
          Pavel Yaskevich added a comment - - edited I made start timestamp schema timestamp repair account and fix any timestamps that are grater than current timestamp and introduced new version in MessagingVersion as it's the only way to fix infinite loop in schema due to broken nano timestamps that we can't properly fix in <= 1.1.6. Edit: Also note that once this is committed we would have to increase VERSION number for cassandra-1.2.
          Hide
          Pavel Yaskevich added a comment -

          it looks like it wasn't a good idea to involve timestamps in computing schema version (was super easy to do), I will work on improving things to make a migration from older versions (which System.nanoTime() bug) as smooth as possible.

          Show
          Pavel Yaskevich added a comment - it looks like it wasn't a good idea to involve timestamps in computing schema version (was super easy to do), I will work on improving things to make a migration from older versions (which System.nanoTime() bug) as smooth as possible.
          Hide
          Mike Heffner added a comment -

          This is after we removed the 1.1.6 node, but:

          Using default limit of 100
          Using default column limit of 100
          -------------------
          RowKey: KSName
          => (column=durable_writes, value=true, timestamp=5397847727343)
          => (column=name, value=KSName, timestamp=5397847727343)
          => (column=strategy_class, value=org.apache.cassandra.locator.NetworkTopologyStrategy, timestamp=5397847727343)
          => (column=strategy_options, value={"us-east":"3"}, timestamp=5397847727343)
          
          1 Row Returned.
          Elapsed time: 132 msec(s).
          
          Show
          Mike Heffner added a comment - This is after we removed the 1.1.6 node, but: Using default limit of 100 Using default column limit of 100 ------------------- RowKey: KSName => (column=durable_writes, value= true , timestamp=5397847727343) => (column=name, value=KSName, timestamp=5397847727343) => (column=strategy_class, value=org.apache.cassandra.locator.NetworkTopologyStrategy, timestamp=5397847727343) => (column=strategy_options, value={ "us-east" : "3" }, timestamp=5397847727343) 1 Row Returned. Elapsed time: 132 msec(s).
          Hide
          Mina Naguib added a comment -

          Pavel Yaskevich Here's mine (after some very minor name obfuscation):

          [default@system] list schema_keyspaces;
          Using default limit of 100
          Using default column limit of 100
          -------------------
          RowKey: AdKS
          => (column=durable_writes, value=true, timestamp=1336233003898)
          => (column=name, value=AdKS, timestamp=1336233003898)
          => (column=strategy_class, value=org.apache.cassandra.locator.NetworkTopologyStrategy, timestamp=1336233003898)
          => (column=strategy_options, value={"DCMTL":"2","DCLA":"2","DCLON":"2"}, timestamp=1336233003898)
          
          1 Row Returned.
          Elapsed time: 3 msec(s).
          

          Tomorrow I'll run a test with cassandra 1.1.6 + CASSANDRA-4837 just to add a second vote to Brandon's test whether it helps or not.

          Show
          Mina Naguib added a comment - Pavel Yaskevich Here's mine (after some very minor name obfuscation): [ default @system] list schema_keyspaces; Using default limit of 100 Using default column limit of 100 ------------------- RowKey: AdKS => (column=durable_writes, value= true , timestamp=1336233003898) => (column=name, value=AdKS, timestamp=1336233003898) => (column=strategy_class, value=org.apache.cassandra.locator.NetworkTopologyStrategy, timestamp=1336233003898) => (column=strategy_options, value={ "DCMTL" : "2" , "DCLA" : "2" , "DCLON" : "2" }, timestamp=1336233003898) 1 Row Returned. Elapsed time: 3 msec(s). Tomorrow I'll run a test with cassandra 1.1.6 + CASSANDRA-4837 just to add a second vote to Brandon's test whether it helps or not.
          Hide
          Pavel Yaskevich added a comment -

          Mike Heffner and Mina Naguib Can you please execute following commands in CLI and post output here - "use system;" and "list schema_keyspaces"?

          Show
          Pavel Yaskevich added a comment - Mike Heffner and Mina Naguib Can you please execute following commands in CLI and post output here - "use system;" and "list schema_keyspaces"?
          Hide
          Brandon Williams added a comment -

          Sent to Pavel privately.

          Show
          Brandon Williams added a comment - Sent to Pavel privately.
          Hide
          Pavel Yaskevich added a comment -

          Brandon Williams Can you attach schema_* sstables that could trigger this behavior to the ticket please (if you have them)?

          Show
          Pavel Yaskevich added a comment - Brandon Williams Can you attach schema_* sstables that could trigger this behavior to the ticket please (if you have them)?
          Hide
          Brandon Williams added a comment -

          No help

          Show
          Brandon Williams added a comment - No help
          Hide
          Pavel Yaskevich added a comment -

          can you try 1.1.6 + CASSANDRA-4837 ?

          Show
          Pavel Yaskevich added a comment - can you try 1.1.6 + CASSANDRA-4837 ?
          Hide
          Brandon Williams added a comment -

          Bisects to CASSANDRA-4561

          Show
          Brandon Williams added a comment - Bisects to CASSANDRA-4561
          Hide
          Brandon Williams added a comment -

          I'm pretty sure that's what's happening, but it's not clear why. You can artificially reproduce this by manually injecting a conflict, such as a "Column family ID mismatch" though that error isn't present here.

          Show
          Brandon Williams added a comment - I'm pretty sure that's what's happening, but it's not clear why. You can artificially reproduce this by manually injecting a conflict, such as a "Column family ID mismatch" though that error isn't present here.
          Hide
          Jonathan Ellis added a comment -

          Is schema synchronization confused? I can't think of any other reason for continuous flush here.

          Show
          Jonathan Ellis added a comment - Is schema synchronization confused? I can't think of any other reason for continuous flush here.
          Hide
          Mike Heffner added a comment -

          We also see this after upgrading a node from 1.1.0 -> 1.1.6 in our ring. We also saw increased flush writes begin continuously on the other 1.1.0 nodes in the ring after the 1.1.6 was started, but not as rapidly as on the 1.1.6 node. There were also periodic HintedHandoffs occurring around the ring after the 1.1.6 node came up.

          Show
          Mike Heffner added a comment - We also see this after upgrading a node from 1.1.0 -> 1.1.6 in our ring. We also saw increased flush writes begin continuously on the other 1.1.0 nodes in the ring after the 1.1.6 was started, but not as rapidly as on the 1.1.6 node. There were also periodic HintedHandoffs occurring around the ring after the 1.1.6 node came up.

            People

            • Assignee:
              Pavel Yaskevich
              Reporter:
              Mina Naguib
              Reviewer:
              Brandon Williams
            • Votes:
              2 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development