Cassandra
  1. Cassandra
  2. CASSANDRA-5633

CQL support for updating multiple rows in a partition using CAS

    Details

    • Type: Improvement Improvement
    • Status: Resolved
    • Priority: Minor Minor
    • Resolution: Duplicate
    • Fix Version/s: None
    • Component/s: None
    • Labels:

      Description

      This is currently supported via Thrift but not via CQL.

        Issue Links

          Activity

          Hide
          Jonathan Ellis added a comment -

          This indeed seems to be a solution to our problem and we could certainly model our data using this approach. I am not sold that we need to introduce yet another concept just for syntactic ease, but this solution does offer us a very straightforward way to specify this specific CAS use case.

          Well, either way we're introducing new concepts. So the question is which approach is more useful, and the "static columns" ticket addresses the "how do i port my thrift code that mixes static and dynamic columns in a single row" problem more generally, which does seem to come up on a regular basis. So I'd prefer to run with that.

          Show
          Jonathan Ellis added a comment - This indeed seems to be a solution to our problem and we could certainly model our data using this approach. I am not sold that we need to introduce yet another concept just for syntactic ease, but this solution does offer us a very straightforward way to specify this specific CAS use case. Well, either way we're introducing new concepts. So the question is which approach is more useful, and the "static columns" ticket addresses the "how do i port my thrift code that mixes static and dynamic columns in a single row" problem more generally, which does seem to come up on a regular basis. So I'd prefer to run with that.
          Hide
          Alex P added a comment -

          Some extensions of SQL (T-SQL, PLSQL) support IF ELSE statements. So another way to do this could be:

          IF (condition) THEN
          BEGIN BATCH
          APPLY BATCH
          

          The only "advantages" of this approach would be:

          1. a structure that is familiar
          2. is just an extension of the current BATCH

          Show
          Alex P added a comment - Some extensions of SQL (T-SQL, PLSQL) support IF ELSE statements. So another way to do this could be: IF (condition) THEN BEGIN BATCH APPLY BATCH The only "advantages" of this approach would be: 1. a structure that is familiar 2. is just an extension of the current BATCH
          Hide
          Sebastian Schmidt added a comment -

          This indeed seems to be a solution to our problem and we could certainly model our data using this approach. I am not sold that we need to introduce yet another concept just for syntactic ease, but this solution does offer us a very straightforward way to specify this specific CAS use case.

          Show
          Sebastian Schmidt added a comment - This indeed seems to be a solution to our problem and we could certainly model our data using this approach. I am not sold that we need to introduce yet another concept just for syntactic ease, but this solution does offer us a very straightforward way to specify this specific CAS use case.
          Hide
          Sylvain Lebresne added a comment -

          I'm starting to wonder if another approaches to this wouldn't be simpler. Namely, I'd like to suggest the idea of supporting "static" columns (the initial idea was suggested by jhalliday on irc). That is, we allow to declare some columns that as "static" to a partition, i.e. their value would be shared by all the rows of the same partition. The reason this is related to this issue is that you could CAS those static columns to update multiple rows of the same partition atomically.

          Concretely, you could define something like that:

          CREATE TABLE t (
            id text PRIMARY KEY,
            version int static,
            insertion_time timeuuid,
            prop1 text,
            prop2 int,
            PRIMARY KEY (insertion_time, prop1, prop2)
          )
          

          The version column being static, it's value would be shared by all rows having the same id so that you can use it as a "partition version" that allows to serialize inserts. More precisely, you'd read some row(s) of the partition, and update some other row of the partition based on CASing the version just read.

          Though it's not 100% equivalent to what the other suggestion of this ticket, I believe this "static" solution would be as general as anything else in terms of what can be done since you can serialize updates in any order you want. And in fact, for every concrete use case I have in mind for this ticket, this static column solution seems to provide a more natural/direct solution (of course, it's quite possible there is use cases I haven't though of and for which this "static columns" idea would be very awkward, but I'd be happy to understand those).

          Other advantages of this static columns solution I can think of are that:

          1. it doesn't require any complex syntax. We'll have to define a few rules to govern those static columns (when do they get deleted, etc..), but syntax wise, it would really all just be the introduction of the "static" keyword in table creation.
          2. it has uses outside of CAS, making it less of a narrow use case. There are cases where people want to basically cram a static and a dynamic table into a single table for efficiency reasons, and this would provide a "native" way to support that.

          I'll soon open a separate issue for this "static columns" idea, with a bit more detail on the exact semantic and some pointers on how I think this can be implemented, but Sebastian Schmidt, is that something which sounds like it would fit well to your use cases? (and if not, can you try to explain why, if only for the sake of better understanding what we're trying to solve here).

          Show
          Sylvain Lebresne added a comment - I'm starting to wonder if another approaches to this wouldn't be simpler. Namely, I'd like to suggest the idea of supporting "static" columns (the initial idea was suggested by jhalliday on irc). That is, we allow to declare some columns that as "static" to a partition, i.e. their value would be shared by all the rows of the same partition. The reason this is related to this issue is that you could CAS those static columns to update multiple rows of the same partition atomically. Concretely, you could define something like that: CREATE TABLE t ( id text PRIMARY KEY, version int static, insertion_time timeuuid, prop1 text, prop2 int, PRIMARY KEY (insertion_time, prop1, prop2) ) The version column being static, it's value would be shared by all rows having the same id so that you can use it as a "partition version" that allows to serialize inserts. More precisely, you'd read some row(s) of the partition, and update some other row of the partition based on CASing the version just read. Though it's not 100% equivalent to what the other suggestion of this ticket, I believe this "static" solution would be as general as anything else in terms of what can be done since you can serialize updates in any order you want. And in fact, for every concrete use case I have in mind for this ticket, this static column solution seems to provide a more natural/direct solution (of course, it's quite possible there is use cases I haven't though of and for which this "static columns" idea would be very awkward, but I'd be happy to understand those). Other advantages of this static columns solution I can think of are that: it doesn't require any complex syntax. We'll have to define a few rules to govern those static columns (when do they get deleted, etc..), but syntax wise, it would really all just be the introduction of the "static" keyword in table creation. it has uses outside of CAS, making it less of a narrow use case. There are cases where people want to basically cram a static and a dynamic table into a single table for efficiency reasons, and this would provide a "native" way to support that. I'll soon open a separate issue for this "static columns" idea, with a bit more detail on the exact semantic and some pointers on how I think this can be implemented, but Sebastian Schmidt , is that something which sounds like it would fit well to your use cases? (and if not, can you try to explain why, if only for the sake of better understanding what we're trying to solve here).
          Hide
          Sebastian Schmidt added a comment -

          We are pretty much stuck with Hector right now and would like to move to CQL once this gets implemented. Our use case is not very specific, but we are trying to model data that contains relationships that need to be updated atomically. Using CAS through CQL makes it impossible for us to use atomic updates, which can easily break our constraints. We believe that any data model that contains dependencies or relationships requires such a functionality should CAS be used as an op-lock mechanism.

          As a suggestion for syntax, we have the following to offer:

          update cf set c='a', d='b' where foo='a' and bar = 'b', set c='x', d='y' where foo='a' and bar = 'c' if (bar='b' and c='d’), (bar='g' and c='h’);

          where the table is:

          CREATE TABLE cf (
          foo text,
          bar text,
          c text,
          d text,
          PRIMARY KEY (foo, bar)
          )

          Show
          Sebastian Schmidt added a comment - We are pretty much stuck with Hector right now and would like to move to CQL once this gets implemented. Our use case is not very specific, but we are trying to model data that contains relationships that need to be updated atomically. Using CAS through CQL makes it impossible for us to use atomic updates, which can easily break our constraints. We believe that any data model that contains dependencies or relationships requires such a functionality should CAS be used as an op-lock mechanism. As a suggestion for syntax, we have the following to offer: update cf set c='a', d='b' where foo='a' and bar = 'b', set c='x', d='y' where foo='a' and bar = 'c' if (bar='b' and c='d’), (bar='g' and c='h’); where the table is: CREATE TABLE cf ( foo text, bar text, c text, d text, PRIMARY KEY (foo, bar) )
          Hide
          Sylvain Lebresne added a comment -

          I'm also quite reluctant in adding some sub-query syntax "just for this", mainly for the cons expresses above, that is 1) the fact that it confuses quite a bit the "we don't do sub-query" message and 2) it doesn't make it very clear what the restrictions on that sub-query are. Which is not to say that I have a much better idea. But maybe a better understanding of what this could be useful for in practice could help here? Granted "thrift can do it" is one type of argument, but outside that, a concrete CQL3 example of why this could be useful could help might help making some progress?

          Show
          Sylvain Lebresne added a comment - I'm also quite reluctant in adding some sub-query syntax "just for this", mainly for the cons expresses above, that is 1) the fact that it confuses quite a bit the "we don't do sub-query" message and 2) it doesn't make it very clear what the restrictions on that sub-query are. Which is not to say that I have a much better idea. But maybe a better understanding of what this could be useful for in practice could help here? Granted "thrift can do it" is one type of argument, but outside that, a concrete CQL3 example of why this could be useful could help might help making some progress?
          Hide
          Jonathan Ellis added a comment -

          Well, that's the point – you only get one condition w/ CAS so that should be explicit in the syntax.

          Show
          Jonathan Ellis added a comment - Well, that's the point – you only get one condition w/ CAS so that should be explicit in the syntax.
          Hide
          Aleksey Yeschenko added a comment -

          This looks all right with '...', but specifying multiple conditions here is going to be truly verbose with IF (SELECT ...) = ... AND ...

          Show
          Aleksey Yeschenko added a comment - This looks all right with '...', but specifying multiple conditions here is going to be truly verbose with IF (SELECT ...) = ... AND ...
          Hide
          Jonathan Ellis added a comment -

          I submit that a more regular extention of batch for the former would be

          BEGIN BATCH
            UPDATE FOO ...
            UPDATE FOO ...
          APPLY BATCH IF ...
          
          Show
          Jonathan Ellis added a comment - I submit that a more regular extention of batch for the former would be BEGIN BATCH UPDATE FOO ... UPDATE FOO ... APPLY BATCH IF ...
          Hide
          Jonathan Ellis added a comment -

          Agreed, although I only have a specific use case for the latter.

          Show
          Jonathan Ellis added a comment - Agreed, although I only have a specific use case for the latter.
          Hide
          Aleksey Yeschenko added a comment -

          I don't see how this solves the problem of "I want to CAS row X based on the value of row Y." Your example looks more like 3 separate operations each of which has CAS-in-same-row.

          True. But to match thrift api you are going to need both the ability to update several rows in a single cas operation and the ability to refer to other rows within the same partition in conditions.

          Show
          Aleksey Yeschenko added a comment - I don't see how this solves the problem of "I want to CAS row X based on the value of row Y." Your example looks more like 3 separate operations each of which has CAS-in-same-row. True. But to match thrift api you are going to need both the ability to update several rows in a single cas operation and the ability to refer to other rows within the same partition in conditions.
          Hide
          Jonathan Ellis added a comment -

          Maybe some new kind of BATCH, CAS BATCH?

          I don't see how this solves the problem of "I want to CAS row X based on the value of row Y." Your example looks more like 3 separate operations each of which has CAS-in-same-row.

          Show
          Jonathan Ellis added a comment - Maybe some new kind of BATCH, CAS BATCH? I don't see how this solves the problem of "I want to CAS row X based on the value of row Y." Your example looks more like 3 separate operations each of which has CAS-in-same-row.
          Hide
          Jonathan Ellis added a comment -

          Well, given that we have someone who's pretty likely to go live with this in Thrift, and we tell people that CQL exposes everything you can do from Thrift, I don't like that solution either.

          Show
          Jonathan Ellis added a comment - Well, given that we have someone who's pretty likely to go live with this in Thrift, and we tell people that CQL exposes everything you can do from Thrift, I don't like that solution either.
          Hide
          Aleksey Yeschenko added a comment -

          I like this even less
          Maybe we shouldn't expose it to cql at all, if it doesn't fit the language?

          Show
          Aleksey Yeschenko added a comment - I like this even less Maybe we shouldn't expose it to cql at all, if it doesn't fit the language?
          Hide
          Jonathan Ellis added a comment - - edited

          How about this?

          UPDATE foo SET x = 10 WHERE partition='key' AND bar=1
          IF (SELECT y FROM foo WHERE partition='key' AND baz = 2) = 3
          

          Pro:

          1. Good match with semantics of what CAS actually does under the hood
          2. Fairly natural for SQL users

          Con:

          1. Need to invent syntax for multiple columns (IF (SELECT y, z ...) = (3, 4))
          2. Risks confusing people who wonder why we don't support subqueries elsewhere
          3. Not necessarily obvious that subquery is restricted to the same partition
          Show
          Jonathan Ellis added a comment - - edited How about this? UPDATE foo SET x = 10 WHERE partition='key' AND bar=1 IF (SELECT y FROM foo WHERE partition='key' AND baz = 2) = 3 Pro: Good match with semantics of what CAS actually does under the hood Fairly natural for SQL users Con: Need to invent syntax for multiple columns ( IF (SELECT y, z ...) = (3, 4) ) Risks confusing people who wonder why we don't support subqueries elsewhere Not necessarily obvious that subquery is restricted to the same partition
          Hide
          Aleksey Yeschenko added a comment -

          We only support updating several rows using BATCH in cql3, so this would somehow involve BATCH as well.

          Maybe some new kind of BATCH, CAS BATCH?

          BEGIN CAS BATCH
          UPDATE foo SET x = 10 WHERE bar = 1 AND baz = 'k' IF x = 9
          UPDATE foo SET y = 15 WHERE bar = 1 AND baz = 'y' IF y = 7
          DELETE FROM foo WHERE bar = 1 AND baz = 'z' IF NOT EXISTS
          APPLY BATCH
          

          We'll validate the partition key is the same (bar) and merge all the conditions together (all the updates, too) and it will only apply if all the conditions are true.
          Not all statements even have to involve CAS, but at least one should.

          It's not perfect, just the first thing that came to mind.

          Show
          Aleksey Yeschenko added a comment - We only support updating several rows using BATCH in cql3, so this would somehow involve BATCH as well. Maybe some new kind of BATCH, CAS BATCH? BEGIN CAS BATCH UPDATE foo SET x = 10 WHERE bar = 1 AND baz = 'k' IF x = 9 UPDATE foo SET y = 15 WHERE bar = 1 AND baz = 'y' IF y = 7 DELETE FROM foo WHERE bar = 1 AND baz = 'z' IF NOT EXISTS APPLY BATCH We'll validate the partition key is the same (bar) and merge all the conditions together (all the updates, too) and it will only apply if all the conditions are true. Not all statements even have to involve CAS, but at least one should. It's not perfect, just the first thing that came to mind.
          Hide
          Jonathan Ellis added a comment -

          Drawing a blank for syntax suggestions, how about you Aleksey Yeschenko Vijay?

          Show
          Jonathan Ellis added a comment - Drawing a blank for syntax suggestions, how about you Aleksey Yeschenko Vijay ?

            People

            • Assignee:
              Sylvain Lebresne
              Reporter:
              sankalp kohli
            • Votes:
              2 Vote for this issue
              Watchers:
              15 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development