Cassandra
  1. Cassandra
  2. CASSANDRA-3794

Avoid ID conflicts from concurrent schema changes

    Details

    • Type: Improvement Improvement
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Fix Version/s: 1.2.0 beta 1
    • Component/s: Core
    • Labels:
      None

      Description

      Change ColumnFamily identifiers to be UUIDs instead of sequential Integers. Would be useful in the situation when nodes simultaneously trying to create ColumnFamilies.

      1. CASSANDRA-3794-trunk.patch
        76 kB
        Pavel Yaskevich
      2. CASSANDRA-3794-v2.patch
        76 kB
        Pavel Yaskevich
      3. CASSANDRA-3794.patch
        56 kB
        Pavel Yaskevich

        Activity

        Hide
        Pavel Yaskevich added a comment -

        Fixed the last thing and committed, thanks!

        Show
        Pavel Yaskevich added a comment - Fixed the last thing and committed, thanks!
        Hide
        Sylvain Lebresne added a comment -

        In StreamRequest serializer, serializedSize should use CFSerializer.cfIdSerializedSize().

        But with that corrected, looks good to me, +1.

        Show
        Sylvain Lebresne added a comment - In StreamRequest serializer, serializedSize should use CFSerializer.cfIdSerializedSize(). But with that corrected, looks good to me, +1.
        Hide
        Pavel Yaskevich added a comment -

        rebased with the latest trunk and fixed nit.

        Show
        Pavel Yaskevich added a comment - rebased with the latest trunk and fixed nit.
        Hide
        Sylvain Lebresne added a comment -

        Two small nits while looking at v2 quickly:

        • In fromSchemaNoColumns, the try catch is not useful.
        • In ColumnFamilySerializer, when we serialize for an old version and can't find an oldId, it'll probably be better to use a more user friendly message like "Cannot send column family X to Y as it's version is pre-1.1.1. Please update the whole cluster to 1.1.1 first".

        But I think we agreed that it's safer to target this at 1.2 and acknowledge that concurrent table creation will only be supported then, so this will need rebase to trunk

        Show
        Sylvain Lebresne added a comment - Two small nits while looking at v2 quickly: In fromSchemaNoColumns, the try catch is not useful. In ColumnFamilySerializer, when we serialize for an old version and can't find an oldId, it'll probably be better to use a more user friendly message like "Cannot send column family X to Y as it's version is pre-1.1.1. Please update the whole cluster to 1.1.1 first". But I think we agreed that it's safer to target this at 1.2 and acknowledge that concurrent table creation will only be supported then, so this will need rebase to trunk
        Hide
        Pavel Yaskevich added a comment -

        v2 that addresses all pointes.

        Show
        Pavel Yaskevich added a comment - v2 that addresses all pointes.
        Hide
        Sylvain Lebresne added a comment -

        Looking more closely, there is actually two problems with respect to rolling upgrades:

        1. because newly created CF won't have an old format id, it means people shouldn't create any CF in a mixed version cluster. That would clearly be fine for a major upgrade, it's more annoying to roll this in a minor upgrade though imo. I don't think there is anything we can do about that.
        2. as is, streaming won't work in a mixed version cluster (as is the case in major upgrade) by virtue of the following code in IncomingTcpConnection:
          if (version == MessagingService.version_)
          {
              ....
          }
          else
          {
              // streaming connections are per-session and have a fixed version.  we can't do anything with a wrong-version stream connection, so drop it.
              logger.error("Received stream using protocol version {} (my version {}). Terminating connection",
                           version, MessagingService.version_);
          }
          

          We could avoid that, by say adding some isStreamingCompatible(v1, v2) method that would return true for VERSION_11 and VERSION_111, since after all there is no change to the stream format. However, the patch also need to version correctly StreamRequestMessage for it to work correctly.

        Overall, this is not a small patch, and it will induces more limited rolling upgrade behavior than is the norm in a minor version, so I'll admit I'm personally growing more in favor of solution #2 above (postpone to 1.2).

        That being said, on the patch itself:

        • In RowCacheKey.compareTo(), == is used intead of equals().
        • In Schema, we can remove the cfIdGen field && MIN_CF_ID.
        • nameUUIDFromBytes already does a md5 internally, so we should just pass the concatenation of ksName and cfName bytes (doubling the md5 slightly augments the chance of collisions).
        • When writing the schema, for the "id" column, the code write a string/UUID (toSchemaNoColumns) but expect an int when reading (fromSchemaNoColumns). The fact is, we don't need to save the new style id in the schema since we can recompute it. So we should keep the "id" column for oldId (if they exist). Also, when writing a CF schema, we should check if it has an associated old cfId and write it if it has (i.e. we should preserve the old ids mapping (when it exists) for now, we'll drop that in a future version).
        • Schema.addOldCfIdMapping should check for null value for the oldId and ignore it, since in fromSchemaNoColumns, result.getInt("id") will return null for new CF.
        • ColumnFamilySerializer needs to version the serialize version, when we talk to old node (same in RowMutation serialize method). Of course, when a CF don't have a old id, we'll have to throw an exception instead (that the 'user shouldn't create CF in a mixed cluster').
        • StreamRequestMessage should version cfId correctly.
        • In SchemaLoader, not sure we want to always assign an old style id to the CF. Instead, it would probably be better to add a few specific tests (serialization test ?) that validate the old id are correctly handled.
        • OCD nit: convertOldCFId could be renamed to convertOldCfId for consistency with the rest (i.e. 'F' could be lowercased)
        Show
        Sylvain Lebresne added a comment - Looking more closely, there is actually two problems with respect to rolling upgrades: because newly created CF won't have an old format id, it means people shouldn't create any CF in a mixed version cluster. That would clearly be fine for a major upgrade, it's more annoying to roll this in a minor upgrade though imo. I don't think there is anything we can do about that. as is, streaming won't work in a mixed version cluster (as is the case in major upgrade) by virtue of the following code in IncomingTcpConnection: if (version == MessagingService.version_) { .... } else { // streaming connections are per-session and have a fixed version. we can't do anything with a wrong-version stream connection, so drop it. logger.error("Received stream using protocol version {} (my version {}). Terminating connection", version, MessagingService.version_); } We could avoid that, by say adding some isStreamingCompatible(v1, v2) method that would return true for VERSION_11 and VERSION_111, since after all there is no change to the stream format. However, the patch also need to version correctly StreamRequestMessage for it to work correctly. Overall, this is not a small patch, and it will induces more limited rolling upgrade behavior than is the norm in a minor version, so I'll admit I'm personally growing more in favor of solution #2 above (postpone to 1.2). That being said, on the patch itself: In RowCacheKey.compareTo(), == is used intead of equals(). In Schema, we can remove the cfIdGen field && MIN_CF_ID. nameUUIDFromBytes already does a md5 internally, so we should just pass the concatenation of ksName and cfName bytes (doubling the md5 slightly augments the chance of collisions). When writing the schema, for the "id" column, the code write a string/UUID (toSchemaNoColumns) but expect an int when reading (fromSchemaNoColumns). The fact is, we don't need to save the new style id in the schema since we can recompute it. So we should keep the "id" column for oldId (if they exist). Also, when writing a CF schema, we should check if it has an associated old cfId and write it if it has (i.e. we should preserve the old ids mapping (when it exists) for now, we'll drop that in a future version). Schema.addOldCfIdMapping should check for null value for the oldId and ignore it, since in fromSchemaNoColumns, result.getInt("id") will return null for new CF. ColumnFamilySerializer needs to version the serialize version, when we talk to old node (same in RowMutation serialize method). Of course, when a CF don't have a old id, we'll have to throw an exception instead (that the 'user shouldn't create CF in a mixed cluster'). StreamRequestMessage should version cfId correctly. In SchemaLoader, not sure we want to always assign an old style id to the CF. Instead, it would probably be better to add a few specific tests (serialization test ?) that validate the old id are correctly handled. OCD nit: convertOldCFId could be renamed to convertOldCfId for consistency with the rest (i.e. 'F' could be lowercased)
        Hide
        Sylvain Lebresne added a comment -

        Previous oppositions weren't mine but in any case I'm personally good going with #3 too (Pavel's patch already includes the id<->uuid mapping, our only problem is the dropping of streaming connection on version mismatch). This fix doesn't affect anything streamed, so it should be fine to bump the messaging version but special case streaming so that it doesn't drop the connection for streams coming from 1.1.0.

        Show
        Sylvain Lebresne added a comment - Previous oppositions weren't mine but in any case I'm personally good going with #3 too (Pavel's patch already includes the id<->uuid mapping, our only problem is the dropping of streaming connection on version mismatch). This fix doesn't affect anything streamed, so it should be fine to bump the messaging version but special case streaming so that it doesn't drop the connection for streams coming from 1.1.0.
        Hide
        Jonathan Ellis added a comment -

        Our options include

        1. break streaming for 1.1.1 as an "emergency fix"
        2. retract claims that 1.1 allows concurrent schema changes, postpone this fix for 1.2
        3. add a mapping between old int IDs and new UUIDs and version streaming accordingly

        I'm okay with any of these, although if #3 is feasible that would be preferred.

        Show
        Jonathan Ellis added a comment - Our options include break streaming for 1.1.1 as an "emergency fix" retract claims that 1.1 allows concurrent schema changes, postpone this fix for 1.2 add a mapping between old int IDs and new UUIDs and version streaming accordingly I'm okay with any of these, although if #3 is feasible that would be preferred.
        Hide
        Sylvain Lebresne added a comment -

        For info, changing the messaging service version will break streaming during the upgrade. And last time we tried, there was strong opposition in changing the messaging service version in a minor release. Not that I have a good solution for fixing this without breaking the protocol.

        Show
        Sylvain Lebresne added a comment - For info, changing the messaging service version will break streaming during the upgrade. And last time we tried, there was strong opposition in changing the messaging service version in a minor release. Not that I have a good solution for fixing this without breaking the protocol.
        Hide
        Pavel Yaskevich added a comment -

        Made a migration from int to uuid, made sure that all SerializationTests pass (and all other tests) and schema is merged correctly in the situations described in previous comments.

        Show
        Pavel Yaskevich added a comment - Made a migration from int to uuid, made sure that all SerializationTests pass (and all other tests) and schema is merged correctly in the situations described in previous comments.
        Hide
        Jonathan Ellis added a comment -

        As long as everyone agrees on one of the CF definitions eventually, or even a combination of the two, that's totally fine.

        Show
        Jonathan Ellis added a comment - As long as everyone agrees on one of the CF definitions eventually, or even a combination of the two, that's totally fine.
        Hide
        Pavel Yaskevich added a comment -

        It doesn't really help tho if the same CF created on the different nodes simultaneously.

        Show
        Pavel Yaskevich added a comment - It doesn't really help tho if the same CF created on the different nodes simultaneously.
        Hide
        Jonathan Ellis added a comment -

        Sounds like we should prototype the "just use cfname" approach and see how bad the hit is in practice. If we're lucky it will be negligible and we can move on.

        Show
        Jonathan Ellis added a comment - Sounds like we should prototype the "just use cfname" approach and see how bad the hit is in practice. If we're lucky it will be negligible and we can move on.
        Hide
        Pavel Yaskevich added a comment -

        Yeah, I think that this was the main reason to add that. Although change proposed here would only work if CFs simultaneously created on the different machines are not the same CF otherwise we would have hopeless situation upon merge which would break the whole schema. We can't really check what schema mutation would lead to before we apply it and try to merge so I don't really see what we can do about it...

        Show
        Pavel Yaskevich added a comment - Yeah, I think that this was the main reason to add that. Although change proposed here would only work if CFs simultaneously created on the different machines are not the same CF otherwise we would have hopeless situation upon merge which would break the whole schema. We can't really check what schema mutation would lead to before we apply it and try to merge so I don't really see what we can do about it...
        Hide
        Jonathan Ellis added a comment -

        I guess a side benefit is, we cut down on the data we send for each insert or read request...

        Show
        Jonathan Ellis added a comment - I guess a side benefit is, we cut down on the data we send for each insert or read request...
        Hide
        Pavel Yaskevich added a comment -

        This got me thinking that if we change int to UUID it wouldn't actually change the deal unless we generate UUID based on the CF attributes (so we can merge simultaneously created CF on the different machines) but after we update attributes again we would have to re-generate id. The question is - do we really need ID for CF? We can guarantee uniqueness of CF in the given keyspace by CF's name...

        Show
        Pavel Yaskevich added a comment - This got me thinking that if we change int to UUID it wouldn't actually change the deal unless we generate UUID based on the CF attributes (so we can merge simultaneously created CF on the different machines) but after we update attributes again we would have to re-generate id. The question is - do we really need ID for CF? We can guarantee uniqueness of CF in the given keyspace by CF's name...
        Hide
        Jonathan Ellis added a comment -

        I guess we'll need some kind of Int -> UUID [bi]map to maintain compatibility during rolling upgrade...

        Show
        Jonathan Ellis added a comment - I guess we'll need some kind of Int -> UUID [bi] map to maintain compatibility during rolling upgrade...
        Hide
        Pavel Yaskevich added a comment -

        Sure it is, given that CFs could be created on the disconnected nodes and have the same cfId assigned - after the schema merge system would crash with RTE.

        Show
        Pavel Yaskevich added a comment - Sure it is, given that CFs could be created on the disconnected nodes and have the same cfId assigned - after the schema merge system would crash with RTE.
        Hide
        Jonathan Ellis added a comment -

        Is this still useful given the CASSANDRA-1391 implementation?

        Show
        Jonathan Ellis added a comment - Is this still useful given the CASSANDRA-1391 implementation?
        Hide
        Jonathan Ellis added a comment -

        I believe Pavel is working on other things now (he can correct me if I am wrong) so I am clearing the assignment in case someone else can tackle it.

        Show
        Jonathan Ellis added a comment - I believe Pavel is working on other things now (he can correct me if I am wrong) so I am clearing the assignment in case someone else can tackle it.
        Hide
        Jonathan Ellis added a comment -

        How do we do this w/o breaking compatibility during the upgrade, is my question.

        Show
        Jonathan Ellis added a comment - How do we do this w/o breaking compatibility during the upgrade, is my question.
        Hide
        Peter Schuller added a comment -

        Which I interpret as the cf id:s that are embedded in e.g. row mutations (remember all those "invalid cfId..." cases upon schema changes). In general, UUID:s should minimize the chances of accidentally applying something to the wrong CF (which I suppose is a generalization of the simultaneous creation case mentioned in the ticket description).

        Show
        Peter Schuller added a comment - Which I interpret as the cf id:s that are embedded in e.g. row mutations (remember all those "invalid cfId..." cases upon schema changes). In general, UUID:s should minimize the chances of accidentally applying something to the wrong CF (which I suppose is a generalization of the simultaneous creation case mentioned in the ticket description).
        Hide
        Pavel Yaskevich added a comment -

        I don't think so because CASSANDRA-1983 proposes to change the way we identify SSTables and this one proposes to change the way we identify ColumnFamilies.

        Show
        Pavel Yaskevich added a comment - I don't think so because CASSANDRA-1983 proposes to change the way we identify SSTables and this one proposes to change the way we identify ColumnFamilies.
        Hide
        Marcus Eriksson added a comment -

        duplicate of CASSANDRA-1983 right?

        Show
        Marcus Eriksson added a comment - duplicate of CASSANDRA-1983 right?

          People

          • Assignee:
            Pavel Yaskevich
            Reporter:
            Pavel Yaskevich
            Reviewer:
            Sylvain Lebresne
          • Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development