Cassandra
  1. Cassandra
  2. CASSANDRA-2536

Schema disagreements when using connections to multiple hosts

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Minor Minor
    • Resolution: Fixed
    • Fix Version/s: 0.7.6, 0.8.0 beta 2
    • Component/s: Core
    • Labels:
      None
    • Environment:

      Two node 0.8-beta1 cluster with one seed and JNA.

      Description

      If you have two thrift connections open to different nodes and you create a KS using the first, then a CF in that KS using the second, you wind up with a schema disagreement even if you wait/sleep after creating the KS.

      The attached script reproduces the issue using pycassa (1.0.6 should work fine, although it has the 0.7 thrift-gen code). It's also reproducible by hand with two cassandra-cli sessions.

      1. 2536-compare-timestamp.txt
        0.8 kB
        Tyler Hobbs
      2. schema_disagree.py
        1 kB
        Tyler Hobbs

        Activity

        Hide
        Mike Bulman added a comment -

        I feel like a better, more critical sounding explanation, is: create a keyspace on node1, create a cf in that keyspace on node2 = hang + schema disagreement.

        Show
        Mike Bulman added a comment - I feel like a better, more critical sounding explanation, is: create a keyspace on node1, create a cf in that keyspace on node2 = hang + schema disagreement.
        Hide
        Jonathan Ellis added a comment -

        And this is fine in 0.7?

        Show
        Jonathan Ellis added a comment - And this is fine in 0.7?
        Hide
        Tyler Hobbs added a comment -

        Actually, I can also reproduce this with a two node 0.7.4 cluster. I'm pretty sure that this does not happen with 0.7.3, but I'll go ahead and verify that.

        Show
        Tyler Hobbs added a comment - Actually, I can also reproduce this with a two node 0.7.4 cluster. I'm pretty sure that this does not happen with 0.7.3, but I'll go ahead and verify that.
        Hide
        Tyler Hobbs added a comment -

        Nevermind my thoughts that it doesn't happen in 0.7.3 – it seems to happen there too. It appears this is not a recent problem.

        Show
        Tyler Hobbs added a comment - Nevermind my thoughts that it doesn't happen in 0.7.3 – it seems to happen there too. It appears this is not a recent problem.
        Hide
        Brian Lovett added a comment -

        Just bumped into this on a fresh 0.7.4 install on our test cluster. Does this only happen in a 2 node ring?

        Show
        Brian Lovett added a comment - Just bumped into this on a fresh 0.7.4 install on our test cluster. Does this only happen in a 2 node ring?
        Hide
        Dhaivat Pandit added a comment -

        encountered this on fresh 0.7.4 - 5 nodes - 100G+ per node.

        decommissioning the bad node and rejoin fixed the problem.

        Show
        Dhaivat Pandit added a comment - encountered this on fresh 0.7.4 - 5 nodes - 100G+ per node. decommissioning the bad node and rejoin fixed the problem.
        Hide
        Jonathan Ellis added a comment -

        Gary, any thoughts on where to start looking?

        Show
        Jonathan Ellis added a comment - Gary, any thoughts on where to start looking?
        Hide
        Gary Dusbabek added a comment -

        any thoughts...

        I was going to add some jmx to get the last N schema versions (seems like it would be handy anyway and will be necessary if we ever get the rollback pony). Send schema to node A, verify that schema is propagated to B, send schema to B and watch the problem happen. The code to start looking at are the Definitions*VerbHandlers.

        Schema version is tracked in two places: gossip and in DatabaseDescriptor.defsVersion. Make sure those are reasonably in sync (was the sourced of one bug in the past).

        Show
        Gary Dusbabek added a comment - any thoughts... I was going to add some jmx to get the last N schema versions (seems like it would be handy anyway and will be necessary if we ever get the rollback pony). Send schema to node A, verify that schema is propagated to B, send schema to B and watch the problem happen. The code to start looking at are the Definitions*VerbHandlers. Schema version is tracked in two places: gossip and in DatabaseDescriptor.defsVersion. Make sure those are reasonably in sync (was the sourced of one bug in the past).
        Hide
        Tyler Hobbs added a comment -

        The issue is the clocks being out of sync between nodes. Sometimes the v1 UUID generated by the second node has an earlier timestamp than the current schema UUID has.

        There are a couple of things that could be fixed here:

        1. A node shouldn't accept a schema change if the timestamp for the new schema would be earlier than its current schema.
        2. Schema modification calls should accept an optional client-side timestamp that will be used for the v1 UUID.

        Show
        Tyler Hobbs added a comment - The issue is the clocks being out of sync between nodes. Sometimes the v1 UUID generated by the second node has an earlier timestamp than the current schema UUID has. There are a couple of things that could be fixed here: 1. A node shouldn't accept a schema change if the timestamp for the new schema would be earlier than its current schema. 2. Schema modification calls should accept an optional client-side timestamp that will be used for the v1 UUID.
        Hide
        Gary Dusbabek added a comment -

        Sometimes the v1 UUID generated by the second node has an earlier timestamp than the current schema UUID has.

        Wouldn't that update be DOA then? I thought we checked to make sure the new migration compared after the current migration (as well as making sure the new migration's previous version matches with the current version).

        A node shouldn't accept a schema change if the timestamp for the new schema would be earlier than its current schema.

        If the clocks are that far off sync, I think the cluster has bigger problems (like writes not being applied). Plus, it would be easy for a node whose clock is way head to 'poison' schema updates from the rest of the cluster who are, in effect, behind the times.

        Schema modification calls should accept an optional client-side timestamp that will be used for the v1 UUID.

        Seems like a better approach.

        Show
        Gary Dusbabek added a comment - Sometimes the v1 UUID generated by the second node has an earlier timestamp than the current schema UUID has. Wouldn't that update be DOA then? I thought we checked to make sure the new migration compared after the current migration (as well as making sure the new migration's previous version matches with the current version). A node shouldn't accept a schema change if the timestamp for the new schema would be earlier than its current schema. If the clocks are that far off sync, I think the cluster has bigger problems (like writes not being applied). Plus, it would be easy for a node whose clock is way head to 'poison' schema updates from the rest of the cluster who are, in effect, behind the times. Schema modification calls should accept an optional client-side timestamp that will be used for the v1 UUID. Seems like a better approach.
        Hide
        Jonathan Ellis added a comment -

        A node shouldn't accept a schema change if the timestamp for the new schema would be earlier than its current schema.

        You need this with or without the client-side timestamp, though; there's no sense in letting people blow their leg off.

        And once you have that you don't need to add a client-side timestamp with all the PITA-ness that involves.

        (And unlike with data modification, I can't think of a use for doing "clever" things w/ a client side timesamp. So pushing it to the client doesn't really solve anything, just means you need to sync clocks across more machines.)

        Show
        Jonathan Ellis added a comment - A node shouldn't accept a schema change if the timestamp for the new schema would be earlier than its current schema. You need this with or without the client-side timestamp, though; there's no sense in letting people blow their leg off. And once you have that you don't need to add a client-side timestamp with all the PITA-ness that involves. (And unlike with data modification, I can't think of a use for doing "clever" things w/ a client side timesamp. So pushing it to the client doesn't really solve anything, just means you need to sync clocks across more machines.)
        Hide
        Tyler Hobbs added a comment -

        Wouldn't that update be DOA then? I thought we checked to make sure the new migration compared after the current migration (as well as making sure the new migration's previous version matches with the current version).

        We do check that the previous version matches, but the migration is applied locally without comparing the current and new uuids.

        If the clocks are that far off sync, I think the cluster has bigger problems (like writes not being applied).

        This can theoretically happen with clocks being off by only tens of milliseconds.

        And unlike with data modification, I can't think of a use for doing "clever" things w/ a client side timesamp. So pushing it to the client doesn't really solve anything, just means you need to sync clocks across more machines.

        Not for clever purposes – it seems to me that clients making schema modifications are more likely to be centralized, so schema changes coming from a single client will (almost) always have increasing timestamps.

        Show
        Tyler Hobbs added a comment - Wouldn't that update be DOA then? I thought we checked to make sure the new migration compared after the current migration (as well as making sure the new migration's previous version matches with the current version). We do check that the previous version matches, but the migration is applied locally without comparing the current and new uuids. If the clocks are that far off sync, I think the cluster has bigger problems (like writes not being applied). This can theoretically happen with clocks being off by only tens of milliseconds. And unlike with data modification, I can't think of a use for doing "clever" things w/ a client side timesamp. So pushing it to the client doesn't really solve anything, just means you need to sync clocks across more machines. Not for clever purposes – it seems to me that clients making schema modifications are more likely to be centralized, so schema changes coming from a single client will (almost) always have increasing timestamps.
        Hide
        Tyler Hobbs added a comment -

        Attached patch compares version timestamps before applying migration locally.

        Show
        Tyler Hobbs added a comment - Attached patch compares version timestamps before applying migration locally.
        Hide
        Tyler Hobbs added a comment -

        I personally think the timestamp comparison is good enough for now. Any interest in opening a new ticket for client-side timestamps?

        Show
        Tyler Hobbs added a comment - I personally think the timestamp comparison is good enough for now. Any interest in opening a new ticket for client-side timestamps?
        Hide
        Jonathan Ellis added a comment -

        it seems to me that clients making schema modifications are more likely to be centralized

        I would have also argued that they are likely to use the same connection (to the same server), and look where that got us.

        I personally think the timestamp comparison is good enough for now

        I am okay with this. What do you think, Gary?

        (Nit: the exception message says "older" but the comparison is "older or equal.")

        Show
        Jonathan Ellis added a comment - it seems to me that clients making schema modifications are more likely to be centralized I would have also argued that they are likely to use the same connection (to the same server), and look where that got us. I personally think the timestamp comparison is good enough for now I am okay with this. What do you think, Gary? (Nit: the exception message says "older" but the comparison is "older or equal.")
        Hide
        Sylvain Lebresne added a comment -

        I'll hijack this conversation by saying that I think we should start advertising that people should try to keep their server clocks in sync unless they have a good reason not too (which would legitimize the fact that "timestamp comparison is good enough"). Counter removes for instance use server side timestamps and would be screwed up by diverging clocks (and by that I mean more screwed up than they already are by design). And really, is there any reason not to install a ntpd server in the first place anyway?

        Show
        Sylvain Lebresne added a comment - I'll hijack this conversation by saying that I think we should start advertising that people should try to keep their server clocks in sync unless they have a good reason not too (which would legitimize the fact that "timestamp comparison is good enough"). Counter removes for instance use server side timestamps and would be screwed up by diverging clocks (and by that I mean more screwed up than they already are by design). And really, is there any reason not to install a ntpd server in the first place anyway?
        Hide
        Gary Dusbabek added a comment -

        I think timestamp comparisons will be fine.

        Show
        Gary Dusbabek added a comment - I think timestamp comparisons will be fine.
        Hide
        Jonathan Ellis added a comment -

        committed, thanks!

        Show
        Jonathan Ellis added a comment - committed, thanks!
        Hide
        Hudson added a comment -

        Integrated in Cassandra-0.7 #462 (See https://builds.apache.org/hudson/job/Cassandra-0.7/462/)
        refuse to apply migrations with older timestamps than the current schema
        patch by Tyler Hobbs; reviewed by jbellis and gdusbabek for CASSANDRA-2536

        Show
        Hudson added a comment - Integrated in Cassandra-0.7 #462 (See https://builds.apache.org/hudson/job/Cassandra-0.7/462/ ) refuse to apply migrations with older timestamps than the current schema patch by Tyler Hobbs; reviewed by jbellis and gdusbabek for CASSANDRA-2536

          People

          • Assignee:
            Tyler Hobbs
            Reporter:
            Tyler Hobbs
            Reviewer:
            Jonathan Ellis
          • Votes:
            1 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development