Details

    • Type: New Feature New Feature
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Fix Version/s: 2.0.6
    • Component/s: None
    • Labels:
      None

      Description

      I'd like to suggest the following idea for adding "static" columns to CQL3. I'll note that the basic idea has been suggested by jhalliday on irc but the rest of the details are mine and I should be blamed for anything stupid in what follows.

      Let me start with a rational: there is 2 main family of CF that have been historically used in Thrift: static ones and dynamic ones. CQL3 handles both family through the presence or not of clustering columns. There is however some cases where mixing both behavior has its use. I like to think of those use cases as 3 broad category:

      1. to denormalize small amounts of not-entirely-static data in otherwise static entities. It's say "tags" for a product or "custom properties" in a user profile. This is why we've added CQL3 collections. Importantly, this is the only use case for which collections are meant (which doesn't diminishes their usefulness imo, and I wouldn't disagree that we've maybe not communicated this too well).
      2. to optimize fetching both a static entity and related dynamic ones. Say you have blog posts, and each post has associated comments (chronologically ordered). And say that a very common query is "fetch a post and its 50 last comments". In that case, it might be beneficial to store a blog post (static entity) in the same underlying CF than it's comments for performance reason. So that "fetch a post and it's 50 last comments" is just one slice internally.
      3. you want to CAS rows of a dynamic partition based on some partition condition. This is the same use case than why CASSANDRA-5633 exists for.

      As said above, 1) is already covered by collections, but 2) and 3) are not (and
      I strongly believe collections are not the right fit, API wise, for those).

      Also, note that I don't want to underestimate the usefulness of 2). In most cases, using a separate table for the blog posts and the comments is The Right Solution, and trying to do 2) is premature optimisation. Yet, when used properly, that kind of optimisation can make a difference, so I think having a relatively native solution for it in CQL3 could make sense.

      Regarding 3), though CASSANDRA-5633 would provide one solution for it, I have the feeling that static columns actually are a more natural approach (in term of API). That's arguably more of a personal opinion/feeling though.

      So long story short, CQL3 lacks a way to mix both some "static" and "dynamic" rows in the same partition of the same CQL3 table, and I think such a tool could have it's use.

      The proposal is thus to allow "static" columns. Static columns would only make sense in table with clustering columns (the "dynamic" ones). A static column value would be static to the partition (all rows of the partition would share the value for such column). The syntax would just be:

      CREATE TABLE t (
        k text,
        s text static,
        i int,
        v text,
        PRIMARY KEY (k, i)
      )
      

      then you'd get:

      INSERT INTO t(k, s, i, v) VALUES ("k0", "I'm shared",       0, "foo");
      INSERT INTO t(k, s, i, v) VALUES ("k0", "I'm still shared", 1, "bar");
      SELECT * FROM t;
       k |                  s | i |    v
      ------------------------------------
      k0 | "I'm still shared" | 0 | "bar"
      k0 | "I'm still shared" | 1 | "foo"
      

      There would be a few semantic details to decide on regarding deletions, ttl, etc. but let's see if we agree it's a good idea first before ironing those out.

      One last point is the implementation. Though I do think this idea has merits, it's definitively not useful enough to justify rewriting the storage engine for it. But I think we can support this relatively easily (emphasis on "relatively" ), which is probably the main reason why I like the approach.

      Namely, internally, we can store static columns as cells whose clustering column values are empty. So in terms of cells, the partition of my example would look like:

      "k0" : [
        (:"s" -> "I'm still shared"), // the static column
        (0:"" -> "")                  // row marker
        (0:"v" -> "bar")
        (1:"" -> "")                  // row marker
        (1:"v" -> "foo")
      ]
      

      Of course, using empty values for the clustering columns doesn't quite work because it could conflict with the user using empty clustering columns. But in the CompositeType encoding we have the end-of-component byte that we could reuse by using a specific value (say 0xFF, currently we never set that byte to anything else than -1, 0 and 1) to indicate it's a static column.

      With that, we'd need to update the CQL3 statements to support the new syntax and rules, but that's probably not horribly hard.

      So anyway, this may or may not be a good idea, but I think it has enough meat to warrant some consideration.

        Issue Links

          Activity

          Hide
          DOAN DuyHai added a comment -

          Ah yes, forgot about compression on protocol side. Thanks.

          Show
          DOAN DuyHai added a comment - Ah yes, forgot about compression on protocol side. Thanks.
          Hide
          Aleksey Yeschenko added a comment -

          Yes they are sent. This shouldn't be an issue with compression enabled.

          Show
          Aleksey Yeschenko added a comment - Yes they are sent. This shouldn't be an issue with compression enabled.
          Hide
          DOAN DuyHai added a comment -

          It's me again.

          By talking with my collegues about the static columns this morning, one interesting question arises.

          CREATE TABLE blogpost (
             id bigint,
             content text static, 
             creation_date timestamp static,
             author_id text static,
             comment_date timeuuid,
             comment_author_id text,
             comment text,
             PRIMARY KEY (id,comment_date)) WITH CLUSTERING ORDER (comment_date DESC);
          

          If we do something like :

           SELECT * FROM blogpost WHERE id=xxx LIMIT 10;
          

          We'll have the blog post content plus the last 10 comments, pretty clear.

          The answer is something like

          id content creation_date author_id comment_date comment_author_id comment
          10 big blog text 2014/02/20 10:00:00 author1 2014/02/20 22:00:00 user1 last comment
          10 big blog text 2014/02/20 10:00:00 author1 2014/02/20 21:50:00 user2 before last comment
          10 big blog text 2014/02/20 10:00:00 author1 2014/02/20 21:40:00 user3 another comment
          10 big blog text 2014/02/20 10:00:00 author1 2014/02/20 21:30:00 user4 and again
          10 big blog text 2014/02/20 10:00:00 author1 2014/02/20 21:00:00 user5 and again

          As you can see the first 4 columns are duplicated, only the last 3 columns vary, as per the CQL3 semantics.

          The question is: how are data sent over the wire ? Are the first 4 columns duplication sent over the network or does the Java driver "reformat" the data to show them duplicated ?

          Show
          DOAN DuyHai added a comment - It's me again. By talking with my collegues about the static columns this morning, one interesting question arises. CREATE TABLE blogpost ( id bigint, content text static, creation_date timestamp static, author_id text static, comment_date timeuuid, comment_author_id text, comment text, PRIMARY KEY (id,comment_date)) WITH CLUSTERING ORDER (comment_date DESC); If we do something like : SELECT * FROM blogpost WHERE id=xxx LIMIT 10; We'll have the blog post content plus the last 10 comments, pretty clear. The answer is something like id content creation_date author_id comment_date comment_author_id comment 10 big blog text 2014/02/20 10:00:00 author1 2014/02/20 22:00:00 user1 last comment 10 big blog text 2014/02/20 10:00:00 author1 2014/02/20 21:50:00 user2 before last comment 10 big blog text 2014/02/20 10:00:00 author1 2014/02/20 21:40:00 user3 another comment 10 big blog text 2014/02/20 10:00:00 author1 2014/02/20 21:30:00 user4 and again 10 big blog text 2014/02/20 10:00:00 author1 2014/02/20 21:00:00 user5 and again As you can see the first 4 columns are duplicated, only the last 3 columns vary, as per the CQL3 semantics. The question is: how are data sent over the wire ? Are the first 4 columns duplication sent over the network or does the Java driver "reformat" the data to show them duplicated ?
          Hide
          Sylvain Lebresne added a comment -

          Does C* raise exception if I try to create a static column in a plain non-clustered table ?

          It does.

          Show
          Sylvain Lebresne added a comment - Does C* raise exception if I try to create a static column in a plain non-clustered table ? It does.
          Hide
          Aleksey Yeschenko added a comment -

          Does C* raise exception if I try to create a static column in a plain non-clustered table ?

          Yes.

          Show
          Aleksey Yeschenko added a comment - Does C* raise exception if I try to create a static column in a plain non-clustered table ? Yes.
          Hide
          DOAN DuyHai added a comment -

          Really great change ! It brings lots of new possibilities for data modeling with CQL3, big +1.

          A very dumb question though: is it allowed to create a static column in a non-clustered table ?

          I mean, it clearly does not make sense to do so but the question is more on validation & checking side ? Does C* raise exception if I try to create a static column in a plain non-clustered table ?

          Show
          DOAN DuyHai added a comment - Really great change ! It brings lots of new possibilities for data modeling with CQL3, big +1. A very dumb question though: is it allowed to create a static column in a non-clustered table ? I mean, it clearly does not make sense to do so but the question is more on validation & checking side ? Does C* raise exception if I try to create a static column in a plain non-clustered table ?
          Hide
          Sylvain Lebresne added a comment -

          Committed (with update to the CQL doc added). Thanks.

          Show
          Sylvain Lebresne added a comment - Committed (with update to the CQL doc added). Thanks.
          Hide
          Aleksey Yeschenko added a comment -

          LGTM, +1

          Show
          Aleksey Yeschenko added a comment - LGTM, +1
          Hide
          Sylvain Lebresne added a comment -

          his example query shows a bug with 2i lookup (with a 2i on a non-static column) + static columns.

          Oh right, we do need to handle static columns there, my bad. Pushed an additional commit to the branch (https://github.com/pcmanus/cassandra/commits/6561-4) for that.

          Show
          Sylvain Lebresne added a comment - his example query shows a bug with 2i lookup (with a 2i on a non-static column) + static columns. Oh right, we do need to handle static columns there, my bad. Pushed an additional commit to the branch ( https://github.com/pcmanus/cassandra/commits/6561-4 ) for that.
          Hide
          Nicolas Favre-Felix added a comment -

          Ah, my apologies for the confusion, I didn't realize the null was due to the 2i issue.
          Selecting a single CQL row does include the static columns now.

          Show
          Nicolas Favre-Felix added a comment - Ah, my apologies for the confusion, I didn't realize the null was due to the 2i issue. Selecting a single CQL row does include the static columns now.
          Hide
          Aleksey Yeschenko added a comment -

          And I guess I agree with you on the CAS batch conflicting value validation thing. Leave it be.

          Show
          Aleksey Yeschenko added a comment - And I guess I agree with you on the CAS batch conflicting value validation thing. Leave it be.
          Hide
          Aleksey Yeschenko added a comment -

          Nicolas' example uses a 2i on a non-static column, which as far as I can tell should work properly. Creating 2i on static columns is indeed not supported, but as the comment in CreateIndexStatement says, this require more 2i specific work to support (and could imo easily let user shoot themselves in the foot) and is left to a later ticket.

          Yeah, I've read every commit line by line, I know what it says. That's the point - Nicolas' issue is a non-issue, but his example query shows a bug with 2i lookup (with a 2i on a non-static column) + static columns. Namely, they aren't being selected. There is nothing fundamental here, just some bug that makes it not work.

          Now maybe there is some other corner cases that you have in mind and are not properly handled by the patch, but then please simply provide the exact query that does not work as you think should and I'll look at it

          This is the query -

          SELECT * FROM bills WHERE user='user1' AND paid=false;

          'balance' here will be null, even when the static column has a value.

          Show
          Aleksey Yeschenko added a comment - Nicolas' example uses a 2i on a non-static column, which as far as I can tell should work properly. Creating 2i on static columns is indeed not supported, but as the comment in CreateIndexStatement says, this require more 2i specific work to support (and could imo easily let user shoot themselves in the foot) and is left to a later ticket. Yeah, I've read every commit line by line, I know what it says. That's the point - Nicolas' issue is a non-issue, but his example query shows a bug with 2i lookup (with a 2i on a non-static column) + static columns. Namely, they aren't being selected. There is nothing fundamental here, just some bug that makes it not work. Now maybe there is some other corner cases that you have in mind and are not properly handled by the patch, but then please simply provide the exact query that does not work as you think should and I'll look at it This is the query - SELECT * FROM bills WHERE user='user1' AND paid= false ; 'balance' here will be null, even when the static column has a value.
          Hide
          Sylvain Lebresne added a comment -

          Other than the Nicolas' issue with static columns and 2i

          Nicolas' example uses a 2i on a non-static column, which as far as I can tell should work properly. Creating 2i on static columns is indeed not supported, but as the comment in CreateIndexStatement says, this require more 2i specific work to support (and could imo easily let user shoot themselves in the foot) and is left to a later ticket.

          I'm on the fence whether or not we should perform the same validation on the cells, too

          I'm relatively strongly against creating an inconsistency here. I'm not saying allowing duplicates for regular batch was a good idea, in fact I think that's an oversight, but I don't think making a difference because there is one or more condition on the batch or not makes particular sense (conditions only control whether or not the batch is executed, but other than that a CAS batch will execute exactly the same way than a regular one).

          Note that in theory, I'd be in favor of always refusing duplicates in batches, but obviously in practice there is the backward compatibility concern and so maybe it's just too late. Or maybe we could do something like with CASSANDRA-6649, just warning for 1 or 2 versions and refusing them altogether after that. But in any case, this belong to another issue imo.

          There is currently no way to select together both the static and clustered columns of a CQL row

          Yes there is, see https://github.com/riptano/cassandra-dtest/blob/master/cql_tests.py#L3542. The only thing we've disallowed is to select only the static columns when clustering columns are in the where clause, because that strongly suggest you're doing something wrong and you'll get the right behavior if you just remove the clustering columns from the where clause. This particular restriction does not forbid any use case in particular. Now maybe there is some other corner cases that you have in mind and are not properly handled by the patch, but then please simply provide the exact query that does not work as you think should and I'll look at it (it's even better if you reuse the example of the dtest above).

          Show
          Sylvain Lebresne added a comment - Other than the Nicolas' issue with static columns and 2i Nicolas' example uses a 2i on a non-static column, which as far as I can tell should work properly. Creating 2i on static columns is indeed not supported, but as the comment in CreateIndexStatement says, this require more 2i specific work to support (and could imo easily let user shoot themselves in the foot) and is left to a later ticket. I'm on the fence whether or not we should perform the same validation on the cells, too I'm relatively strongly against creating an inconsistency here. I'm not saying allowing duplicates for regular batch was a good idea, in fact I think that's an oversight, but I don't think making a difference because there is one or more condition on the batch or not makes particular sense (conditions only control whether or not the batch is executed, but other than that a CAS batch will execute exactly the same way than a regular one). Note that in theory, I'd be in favor of always refusing duplicates in batches, but obviously in practice there is the backward compatibility concern and so maybe it's just too late. Or maybe we could do something like with CASSANDRA-6649 , just warning for 1 or 2 versions and refusing them altogether after that. But in any case, this belong to another issue imo. There is currently no way to select together both the static and clustered columns of a CQL row Yes there is, see https://github.com/riptano/cassandra-dtest/blob/master/cql_tests.py#L3542 . The only thing we've disallowed is to select only the static columns when clustering columns are in the where clause, because that strongly suggest you're doing something wrong and you'll get the right behavior if you just remove the clustering columns from the where clause. This particular restriction does not forbid any use case in particular. Now maybe there is some other corner cases that you have in mind and are not properly handled by the patch, but then please simply provide the exact query that does not work as you think should and I'll look at it (it's even better if you reuse the example of the dtest above).
          Hide
          Nicolas Favre-Felix added a comment -

          Sylvain Lebresne, my point was not about 2i, but about the fact that partition-level isolation is only useful if during read you can select both the static and the clustered columns. An isolation property is used to guarantee that readers do not see partially-applied updates. If readers have to issue two queries to select the static and the clusters columns, they cannot rely on any isolation. They will see partially applied updates and have an inconsistent view of the partition.

          There is currently no way to select together both the static and clustered columns of a CQL row, even though this is how they are returned when the full partition is queried.

          There are two more points on this subject:

          1. It seems to me that presenting CQL rows with the static columns for full-partition scans but not doing so in any other case displays a level of inconsistency in the API.
          2. All of this is pretty easy to do with Thrift.
          Show
          Nicolas Favre-Felix added a comment - Sylvain Lebresne , my point was not about 2i, but about the fact that partition-level isolation is only useful if during read you can select both the static and the clustered columns. An isolation property is used to guarantee that readers do not see partially-applied updates. If readers have to issue two queries to select the static and the clusters columns, they cannot rely on any isolation. They will see partially applied updates and have an inconsistent view of the partition. There is currently no way to select together both the static and clustered columns of a CQL row, even though this is how they are returned when the full partition is queried. There are two more points on this subject: It seems to me that presenting CQL rows with the static columns for full-partition scans but not doing so in any other case displays a level of inconsistency in the API. All of this is pretty easy to do with Thrift.
          Hide
          Aleksey Yeschenko added a comment -

          That, and some nits:

          • ColumnCondition could use some bling (http://youtu.be/lWA2pjMjpBs)
          • CQL3CasConditions.ColumnsConditions doesn't need cfm. Also, one more diamond missing in CQL3CasConditions constructor.
          Show
          Aleksey Yeschenko added a comment - That, and some nits: ColumnCondition could use some bling ( http://youtu.be/lWA2pjMjpBs ) CQL3CasConditions.ColumnsConditions doesn't need cfm. Also, one more diamond missing in CQL3CasConditions constructor.
          Hide
          Aleksey Yeschenko added a comment -

          Nicolas Favre-Felix that's not the point of that change.

          The idea was for this to return the row and the static value, as expected:

          cqlsh:ks1> SELECT * FROM bills WHERE user='user1' and expense_id = 1000;
          
           user  | expense_id | balance | amount | item    | paid
          -------+------------+---------+--------+---------+------
           user1 |       1000 |    -200 |      8 | burrito | True
          
          (1 rows)
          

          But for this throw an IRE, to match UPDATE/DELETE behavior:

          cqlsh:ks1> SELECT balance FROM bills WHERE user='user1' and expense_id = 1000;
          Bad Request: Cannot restrict clustering columns when selecting only static columns
          

          And https://github.com/pcmanus/cassandra/commit/67b4f976b57399bcbb94b3b03eaaf71842b6e843 does exactly this.

          Your issue looks entirely unrelated to me (it's about static columns and 2i queries), but yeah, it's still an issue to be addressed in the next iteration.

          Sylvain Lebresne Looks almost LGTM. Other than the Nicolas' issue with static columns and 2i, I have one more potential issue. While the conditions validation now works properly, I'm on the fence whether or not we should perform the same validation on the cells, too.

          Think

          begin batch
          update foo set z = 13 where x = 'a' and y = 1 if z = 23;
          update foo set z = 12 where x = 'a' and y = 1 if z = 23;
          apply batch
          

          z = 13 and z = 12 updates are in conflict here, but the batch itself will be applied, with z = 13 cell winning the reconcile. While this is what we do for regular batches, I'm not sure if we should be doing the same for CAS batches, and not validate update conflicts the same way we validate the conditions.

          Show
          Aleksey Yeschenko added a comment - Nicolas Favre-Felix that's not the point of that change. The idea was for this to return the row and the static value, as expected: cqlsh:ks1> SELECT * FROM bills WHERE user='user1' and expense_id = 1000; user | expense_id | balance | amount | item | paid -------+------------+---------+--------+---------+------ user1 | 1000 | -200 | 8 | burrito | True (1 rows) But for this throw an IRE, to match UPDATE/DELETE behavior: cqlsh:ks1> SELECT balance FROM bills WHERE user='user1' and expense_id = 1000; Bad Request: Cannot restrict clustering columns when selecting only static columns And https://github.com/pcmanus/cassandra/commit/67b4f976b57399bcbb94b3b03eaaf71842b6e843 does exactly this. Your issue looks entirely unrelated to me (it's about static columns and 2i queries), but yeah, it's still an issue to be addressed in the next iteration. Sylvain Lebresne Looks almost LGTM. Other than the Nicolas' issue with static columns and 2i, I have one more potential issue. While the conditions validation now works properly, I'm on the fence whether or not we should perform the same validation on the cells, too. Think begin batch update foo set z = 13 where x = 'a' and y = 1 if z = 23; update foo set z = 12 where x = 'a' and y = 1 if z = 23; apply batch z = 13 and z = 12 updates are in conflict here, but the batch itself will be applied, with z = 13 cell winning the reconcile. While this is what we do for regular batches, I'm not sure if we should be doing the same for CAS batches, and not validate update conflicts the same way we validate the conditions.
          Hide
          Nicolas Favre-Felix added a comment -

          Thanks Sylvain for the new patches, it looks great.

          This query should throw IRE, just like DELETEs do now.

          I think this is a problem since there is no way to retrieve both a CQL row and the partition's static columns in a single SELECT.
          This is an issue since partition-level isolation guarantees that you don't see partial updates within a partition; if the whole point of static columns is to have a consistent view of both clustered and unclustered data within a partition, not being able to fetch both in a single operation makes this isolation property useless.

          Keeping the example of bills that have or haven't been paid, here's a table definition:

          CREATE TABLE bills (
              user text,
              balance bigint  static,
              expense_id bigint,
              amount bigint,
              item text,
              paid boolean,
              PRIMARY KEY (user, expense_id)
          );
          
          CREATE INDEX unpaid ON bills (paid);
          

          Let's create 2 expenses for a single user, with CAS updates:

          
          BEGIN BATCH
              INSERT INTO bills (user, expense_id, amount, item, paid) values ('user1', 1000, 8, 'burrito', false);
              INSERT INTO bills (user, balance) VALUES ('user1', -8) IF NOT EXISTS;
          APPLY BATCH;
          
          
          BEGIN BATCH
              INSERT INTO bills (user, expense_id, amount, item, paid) values ('user1', 2000, 200, 'hotel room', false);
              UPDATE bills SET balance = -208 WHERE user='user1' IF balance = -8;
          APPLY BATCH;
          

          They are both present:

          > SELECT * FROM bills WHERE user='user1';
          
           user  | expense_id | balance | amount | item       | paid
          -------+------------+---------+--------+------------+-------
           user1 |       1000 |    -208 |      8 |    burrito | False
           user1 |       2000 |    -208 |    200 | hotel room | False
          
          (2 rows)
          

          The great thing about using a single partition that's updated with CAS is that all queries that read the full partition will always see a consistent view of the data, and respect our invariants – in our case, that the sum of the amount for all unpaid bills + the balance is equal to zero.

          We can pay bills using CAS too – let's pay for the burrito:

          BEGIN BATCH
              UPDATE bills SET paid=true WHERE user='user1' AND expense_id=1000;
              UPDATE bills SET balance=-200 WHERE user='user1' IF balance=-208;
          APPLY BATCH;
          

          This works, of course, and any client that retrieve the full partition would either see balance=-208 and all bills unpaid, or balance=-200 and one bill paid, but never anything else.

          If we don't return the balance with the bill in a single SELECT, we lose the isolation property and the query for the balance could be out of date with the query for the bills themselves (SELECT * FROM bills WHERE user='user1' AND paid=false;)

          I'd argue that it is also confusing for users to see CQL rows with static columns filled in when they select the full partition but can't access the same data if they give the full PK coordinates of that row: you'd expect the second query to select a subset of the data extracted by the first.

          Show
          Nicolas Favre-Felix added a comment - Thanks Sylvain for the new patches, it looks great. This query should throw IRE, just like DELETEs do now. I think this is a problem since there is no way to retrieve both a CQL row and the partition's static columns in a single SELECT. This is an issue since partition-level isolation guarantees that you don't see partial updates within a partition; if the whole point of static columns is to have a consistent view of both clustered and unclustered data within a partition, not being able to fetch both in a single operation makes this isolation property useless. Keeping the example of bills that have or haven't been paid, here's a table definition: CREATE TABLE bills ( user text, balance bigint static , expense_id bigint, amount bigint, item text, paid boolean , PRIMARY KEY (user, expense_id) ); CREATE INDEX unpaid ON bills (paid); Let's create 2 expenses for a single user, with CAS updates: BEGIN BATCH INSERT INTO bills (user, expense_id, amount, item, paid) values ('user1', 1000, 8, 'burrito', false ); INSERT INTO bills (user, balance) VALUES ('user1', -8) IF NOT EXISTS; APPLY BATCH; BEGIN BATCH INSERT INTO bills (user, expense_id, amount, item, paid) values ('user1', 2000, 200, 'hotel room', false ); UPDATE bills SET balance = -208 WHERE user='user1' IF balance = -8; APPLY BATCH; They are both present: > SELECT * FROM bills WHERE user='user1'; user | expense_id | balance | amount | item | paid -------+------------+---------+--------+------------+------- user1 | 1000 | -208 | 8 | burrito | False user1 | 2000 | -208 | 200 | hotel room | False (2 rows) The great thing about using a single partition that's updated with CAS is that all queries that read the full partition will always see a consistent view of the data, and respect our invariants – in our case, that the sum of the amount for all unpaid bills + the balance is equal to zero. We can pay bills using CAS too – let's pay for the burrito: BEGIN BATCH UPDATE bills SET paid= true WHERE user='user1' AND expense_id=1000; UPDATE bills SET balance=-200 WHERE user='user1' IF balance=-208; APPLY BATCH; This works, of course, and any client that retrieve the full partition would either see balance=-208 and all bills unpaid, or balance=-200 and one bill paid, but never anything else. If we don't return the balance with the bill in a single SELECT, we lose the isolation property and the query for the balance could be out of date with the query for the bills themselves (SELECT * FROM bills WHERE user='user1' AND paid=false;) I'd argue that it is also confusing for users to see CQL rows with static columns filled in when they select the full partition but can't access the same data if they give the full PK coordinates of that row: you'd expect the second query to select a subset of the data extracted by the first.
          Hide
          Sylvain Lebresne added a comment -

          This query should throw IRE, just like DELETEs do now.
          Typo in ModificationStatement in "Invalid restriction on clustering column %s since %s statement only modify static columns" - should be "the statement %s modifies".

          Pushed a simple commit for those on the last branch (for ModificationStatement, I did fixed the message but kept "the %s statement" over "the statement %s", I think "since the UPDATE statement modifies ..." reads better "since the statement UPDATE modifies ...").

          I think we should be doing more extensive validation for cas batches.

          I agree, though that part is slightly annoying. We can't find duplicates until we have the bound values since we can't tell which row a statement applies to in general before that time. And with the current code, conditions reuse the Operation code which makes it slightly harder to do cleanly.

          Besides, as it turns out, conditions validation is broken for collections currently. We allow stuff like

          UPDTATE ... IF s = s + {2, 3}
          

          which is non-sensical. And collection conditions is even more broken in the sense that even

          UPDTATE ... IF s = {2, 3}
          

          is not properly handled as we only check that the set contains 2 and 3, but not that it contains only those.

          In any case, I think the proper way to handle all this is to refactor stuffs a bit so that conditions don't reuse Operation which is not adapted. To a large extent, fixing collections mishandling is not specific to this issue, but it bothers me to do a dirty fix for just the duplicate validation issue alone so I pushed a last commit (still on the same branch) that fixes all that. I'll note that this refactor will makes it relatively easy to add support for stuff like:

          UPDATE ... IF s CONTAINS 2;
          

          or even

          UPDATE ... IF v >= 2;
          

          which are surely nice to have, but that definitively belong to a followup ticket.

          Show
          Sylvain Lebresne added a comment - This query should throw IRE, just like DELETEs do now. Typo in ModificationStatement in "Invalid restriction on clustering column %s since %s statement only modify static columns" - should be "the statement %s modifies". Pushed a simple commit for those on the last branch (for ModificationStatement, I did fixed the message but kept "the %s statement" over "the statement %s", I think "since the UPDATE statement modifies ..." reads better "since the statement UPDATE modifies ..."). I think we should be doing more extensive validation for cas batches. I agree, though that part is slightly annoying. We can't find duplicates until we have the bound values since we can't tell which row a statement applies to in general before that time. And with the current code, conditions reuse the Operation code which makes it slightly harder to do cleanly. Besides, as it turns out, conditions validation is broken for collections currently. We allow stuff like UPDTATE ... IF s = s + {2, 3} which is non-sensical. And collection conditions is even more broken in the sense that even UPDTATE ... IF s = {2, 3} is not properly handled as we only check that the set contains 2 and 3, but not that it contains only those. In any case, I think the proper way to handle all this is to refactor stuffs a bit so that conditions don't reuse Operation which is not adapted. To a large extent, fixing collections mishandling is not specific to this issue, but it bothers me to do a dirty fix for just the duplicate validation issue alone so I pushed a last commit (still on the same branch) that fixes all that. I'll note that this refactor will makes it relatively easy to add support for stuff like: UPDATE ... IF s CONTAINS 2; or even UPDATE ... IF v >= 2; which are surely nice to have, but that definitively belong to a followup ticket.
          Hide
          Aleksey Yeschenko added a comment - - edited

          Also, I think we should be doing more extensive validation for cas batches. As of now, the following batch will execute successfully, assuming t == 5, but will fail, assuming t == 4.

          begin batch
          update foo set z = 1 where x = 'a' and y = 1 if t = 5;
          update foo set z = 2 where x = 'a' and y = 2 if t = 4;
          apply batch
          

          Another scenario:

          begin batch
          update foo set z = 13 where x = 'a' and y = 1 if z = 23;
          update foo set z = 12 where x = 'a' and y = 1 if z = 22;
          apply batch
          

          If z == 23, the batch will succeed. With z == 22 it would not. And even if it does, 'z' will be set to 12, making it even more confusing/undefined. Such batches should be rejected.

          In other words, we should not allow updating overlapping cells and having overlapping conditions.

          Should also consider the implications of UnsortedColumns replaced by ABSC in trunk, just in case. (also use addAll in BS.executeWithConditions, b/c resolve() there is gone).

          P.S. Updated cqlsh to handle multiple rows returned from a failed CAS batch updated in bbc56eb60d9c9629deb231131d65cf55ee15c2db.

          Show
          Aleksey Yeschenko added a comment - - edited Also, I think we should be doing more extensive validation for cas batches. As of now, the following batch will execute successfully, assuming t == 5, but will fail, assuming t == 4. begin batch update foo set z = 1 where x = 'a' and y = 1 if t = 5; update foo set z = 2 where x = 'a' and y = 2 if t = 4; apply batch Another scenario: begin batch update foo set z = 13 where x = 'a' and y = 1 if z = 23; update foo set z = 12 where x = 'a' and y = 1 if z = 22; apply batch If z == 23, the batch will succeed. With z == 22 it would not. And even if it does, 'z' will be set to 12, making it even more confusing/undefined. Such batches should be rejected. In other words, we should not allow updating overlapping cells and having overlapping conditions. Should also consider the implications of UnsortedColumns replaced by ABSC in trunk, just in case. (also use addAll in BS.executeWithConditions, b/c resolve() there is gone). P.S. Updated cqlsh to handle multiple rows returned from a failed CAS batch updated in bbc56eb60d9c9629deb231131d65cf55ee15c2db.
          Hide
          Aleksey Yeschenko added a comment - - edited

          Using Nicolas Favre-Felix's example,

          cqlsh:ks1>  select t from foo where x = 'a' and y = 1;
          
          (0 rows)
          

          This query should throw IRE, just like DELETEs do now.

          Typo in ModificationStatement in "Invalid restriction on clustering column %s since %s statement only modify static columns" - should be "the statement %s modifies".

          Other than that, LGTM.

          Show
          Aleksey Yeschenko added a comment - - edited Using Nicolas Favre-Felix 's example, cqlsh:ks1> select t from foo where x = 'a' and y = 1; (0 rows) This query should throw IRE, just like DELETEs do now. Typo in ModificationStatement in "Invalid restriction on clustering column %s since %s statement only modify static columns" - should be "the statement %s modifies". Other than that, LGTM.
          Hide
          Sylvain Lebresne added a comment -

          Pushed an additional commit on the previous branch to address the SELECT problems mentioned by Nicolas. Turns out reversed queries where not properly handled either, the "static slice" was not properly reversed so fixed that too.

          Show
          Sylvain Lebresne added a comment - Pushed an additional commit on the previous branch to address the SELECT problems mentioned by Nicolas. Turns out reversed queries where not properly handled either, the "static slice" was not properly reversed so fixed that too.
          Hide
          Aleksey Yeschenko added a comment -

          I have a small reluctance to that because that makes things slightly more annoying for clients

          I don't feel strongly about that point, which is why a labeled it a wish, so forget about it.

          I'll have one final look as soon as Nicolas Favre-Felix's comments are addressed.

          Show
          Aleksey Yeschenko added a comment - I have a small reluctance to that because that makes things slightly more annoying for clients I don't feel strongly about that point, which is why a labeled it a wish, so forget about it. I'll have one final look as soon as Nicolas Favre-Felix 's comments are addressed.
          Hide
          Nicolas Favre-Felix added a comment -

          Hi,

          I'd like to add that it is important to be able to fetch both a static column and clustered columns in a single select; this does not seem possible at the moment:

          cqlsh:ks1> CREATE TABLE foo (
            x text,
            y bigint,
            t bigint static,
            z bigint,
            PRIMARY KEY (x, y)
          );
          
          cqlsh:ks1> insert into foo (x,y,z) values ('a', 1, 10);
          cqlsh:ks1> insert into foo (x,y,z) values ('a', 2, 20);
          cqlsh:ks1> update foo set t = 2 where x='a';
          cqlsh:ks1> select * from foo;
          
           x | y | t | z
          ---+---+---+----
           a | 1 | 2 | 10
           a | 2 | 2 | 20
          
          (2 rows)
          

          Here we have a select over a whole partition and it pulls the static column just fine. Selecting a CQL row works, of course, and selecting a static column does too:

          cqlsh:ks1> select x,y,z from foo where x='a' and y=1;
          
           x | y | z
          ---+---+----
           a | 1 | 10
          
          (1 rows)
          
          cqlsh:ks1> select t from foo where x='a';
          
           t
          ---
           2
          
          (1 rows)
          

          But selecting them together fails to return anything:

          cqlsh:ks1> select x,y,z,t from foo where x='a' and y=1;
          
          (0 rows)
          

          Now this does partly make sense because there isn't really a value for "t" where y=1 since "t" isn't clustered. But it is important to be consistent with the output for the full table.

          Note that querying the full partition returns only the static column now:

          cqlsh:ks1> select x,y,z,t from foo where x='a';
          
           x | y    | z    | t
          ---+------+------+---
           a | null | null | 4
          
          (1 rows)
          

          Currently, the patches add support for:

          • Selecting a CQL row by primary key (that's a standard feature).
          • Selecting a static column by partition key (added by Sylvain).

          So I'd say it's important to be able to support:

          • Selecting clustered as well as static columns for a given CQL row.
          • Selecting clustered as well as static columns for a given partition.

          Not being able to fetch both the static column and a CQL row or set of CQL rows in a single read makes it impossible to rely on partition-level isolation for consistent reads.

          Show
          Nicolas Favre-Felix added a comment - Hi, I'd like to add that it is important to be able to fetch both a static column and clustered columns in a single select; this does not seem possible at the moment: cqlsh:ks1> CREATE TABLE foo ( x text, y bigint, t bigint static , z bigint, PRIMARY KEY (x, y) ); cqlsh:ks1> insert into foo (x,y,z) values ('a', 1, 10); cqlsh:ks1> insert into foo (x,y,z) values ('a', 2, 20); cqlsh:ks1> update foo set t = 2 where x='a'; cqlsh:ks1> select * from foo; x | y | t | z ---+---+---+---- a | 1 | 2 | 10 a | 2 | 2 | 20 (2 rows) Here we have a select over a whole partition and it pulls the static column just fine. Selecting a CQL row works, of course, and selecting a static column does too: cqlsh:ks1> select x,y,z from foo where x='a' and y=1; x | y | z ---+---+---- a | 1 | 10 (1 rows) cqlsh:ks1> select t from foo where x='a'; t --- 2 (1 rows) But selecting them together fails to return anything: cqlsh:ks1> select x,y,z,t from foo where x='a' and y=1; (0 rows) Now this does partly make sense because there isn't really a value for "t" where y=1 since "t" isn't clustered. But it is important to be consistent with the output for the full table. Note that querying the full partition returns only the static column now: cqlsh:ks1> select x,y,z,t from foo where x='a'; x | y | z | t ---+------+------+--- a | null | null | 4 (1 rows) Currently, the patches add support for: Selecting a CQL row by primary key (that's a standard feature). Selecting a static column by partition key (added by Sylvain). So I'd say it's important to be able to support: Selecting clustered as well as static columns for a given CQL row. Selecting clustered as well as static columns for a given partition. Not being able to fetch both the static column and a CQL row or set of CQL rows in a single read makes it impossible to rely on partition-level isolation for consistent reads.
          Hide
          Sylvain Lebresne added a comment -

          Pushed an additional commit at https://github.com/pcmanus/cassandra/commits/6561-4 (it's a new branch because I rebased out of habit, but only the last commit is new) that address most the remarks.

          "DELETE static_col FROM table WHERE partition_key = X and clustering_col = Y" should? probably throw an error, and not delete the static_col? (not sure about this one)

          Not sure either . But it's definitively dodgy and we can always refuse it now and change our mind later (while the contrary is harder) so rejecting it now. It also rejects "UPDATE table SET static_col=X WHERE partition_key=Y and clustering_col=Z" because that's really kind of the same thing.

          I'd prefer BEGIN CAS BATCH, b/c currently you can use either BEGIN UNLOGGED BATCH or BEGIN LOGGED BATCH with conditions, and it confuses things.

          I have a small reluctance to that because that makes things slightly more annoying for clients (you might have to special case code just to indicate the batch contains conditions where really this is not necessary, you already know if you've put conditions or not, users are adults). And I'm not really worried about UNLOGGED versus LOGGED being equivalent since that's already the case for "normal" operations on the same partition anyway. Another detail is that for batch in the native protocol, it would add a new type, and while arguably it doesn't require a protocol bump per-se, but would still require client drivers to update and would provide a bit more confusion for users as to which pair of client/server version support those batch or not at the protocol level.

          None of this is major and I don't feel extremely strongly either way, but I don't think forcing users to explicitly label the batch really buys anything outside of some annoyance so I'd rather not do it.

          Note that I realise the same could be said for COUNTER batches but 1) I do think in hindsight that requiring that label adds a minor annoyance without really buying us much and we could have avoided it (don't get me wrong, it's absolutely minor and I'm not suggesting changing it now) and 2) it's probably somewhat more justified for counters because those are really segregated to their own tables.

          Show
          Sylvain Lebresne added a comment - Pushed an additional commit at https://github.com/pcmanus/cassandra/commits/6561-4 (it's a new branch because I rebased out of habit, but only the last commit is new) that address most the remarks. "DELETE static_col FROM table WHERE partition_key = X and clustering_col = Y" should? probably throw an error, and not delete the static_col? (not sure about this one) Not sure either . But it's definitively dodgy and we can always refuse it now and change our mind later (while the contrary is harder) so rejecting it now. It also rejects "UPDATE table SET static_col=X WHERE partition_key=Y and clustering_col=Z" because that's really kind of the same thing. I'd prefer BEGIN CAS BATCH, b/c currently you can use either BEGIN UNLOGGED BATCH or BEGIN LOGGED BATCH with conditions, and it confuses things. I have a small reluctance to that because that makes things slightly more annoying for clients (you might have to special case code just to indicate the batch contains conditions where really this is not necessary, you already know if you've put conditions or not, users are adults). And I'm not really worried about UNLOGGED versus LOGGED being equivalent since that's already the case for "normal" operations on the same partition anyway. Another detail is that for batch in the native protocol, it would add a new type, and while arguably it doesn't require a protocol bump per-se, but would still require client drivers to update and would provide a bit more confusion for users as to which pair of client/server version support those batch or not at the protocol level. None of this is major and I don't feel extremely strongly either way, but I don't think forcing users to explicitly label the batch really buys anything outside of some annoyance so I'd rather not do it. Note that I realise the same could be said for COUNTER batches but 1) I do think in hindsight that requiring that label adds a minor annoyance without really buying us much and we could have avoided it (don't get me wrong, it's absolutely minor and I'm not suggesting changing it now) and 2) it's probably somewhat more justified for counters because those are really segregated to their own tables.
          Hide
          Aleksey Yeschenko added a comment -

          All right.

          Issues:

          • ALTER TABLE RENAME allows renaming static columns (should not)
          • "DELETE static_col FROM table WHERE partition_key = X and clustering_col = Y" should? probably throw an error, and not delete the static_col? (not sure about this one)
          • continuing with the DELETEs, "DELETE static_col, regular_col FROM table WHERE partition_key = X" will throw "Missing mandatory PRIMARY KEY part ck since static_col specified", which is minor, of course, but should pick first non-static column instead (DeleteStatement)

          Nit: CFMetaData.SchemaColumnsCf still has 'is_static' in it

          Typo: 'buildt' in 'Note that we know that only the first group buildt can be static' in ColumnGroupMap

          Wishes:

          • I'd prefer BEGIN CAS BATCH, b/c currently you can use either BEGIN UNLOGGED BATCH or BEGIN [LOGGED] BATCH with conditions, and it confuses things. I prefer explicit here.
          • I'd prefer maybeUpdatePrefix() to updatePrefix()
          Show
          Aleksey Yeschenko added a comment - All right. Issues: ALTER TABLE RENAME allows renaming static columns (should not) "DELETE static_col FROM table WHERE partition_key = X and clustering_col = Y" should? probably throw an error, and not delete the static_col? (not sure about this one) continuing with the DELETEs, "DELETE static_col, regular_col FROM table WHERE partition_key = X" will throw "Missing mandatory PRIMARY KEY part ck since static_col specified", which is minor, of course, but should pick first non-static column instead (DeleteStatement) Nit: CFMetaData.SchemaColumnsCf still has 'is_static' in it Typo: 'buildt' in 'Note that we know that only the first group buildt can be static' in ColumnGroupMap Wishes: I'd prefer BEGIN CAS BATCH, b/c currently you can use either BEGIN UNLOGGED BATCH or BEGIN [LOGGED] BATCH with conditions, and it confuses things. I prefer explicit here. I'd prefer maybeUpdatePrefix() to updatePrefix()
          Hide
          Sylvain Lebresne added a comment -

          ALTER TABLE ADD should support adding a static column, but doesn't

          Right, pushed an additional commit for that on the same branch than before (https://github.com/pcmanus/cassandra/commits/6561-3). I updated the dtest for that too.

          dropping a static column doesn't work fully (it won't be compacted away)

          You might have to be more specific than that. As far as I can tell, there is nothing special that should be done for static columns outside of making sure the column name does get added to droppedColumns and that's the case. I confirmed that with a quick manual test too: unless sstable2json is lying to me, the dropped static columns does got compacted away.

          Show
          Sylvain Lebresne added a comment - ALTER TABLE ADD should support adding a static column, but doesn't Right, pushed an additional commit for that on the same branch than before ( https://github.com/pcmanus/cassandra/commits/6561-3 ). I updated the dtest for that too. dropping a static column doesn't work fully (it won't be compacted away) You might have to be more specific than that. As far as I can tell, there is nothing special that should be done for static columns outside of making sure the column name does get added to droppedColumns and that's the case. I confirmed that with a quick manual test too: unless sstable2json is lying to me, the dropped static columns does got compacted away.
          Hide
          Aleksey Yeschenko added a comment -

          Still not a full review, just things that don't work:

          • ALTER TABLE ADD should support adding a static column, but doesn't
          • dropping a static column doesn't work fully (it won't be compacted away)
          Show
          Aleksey Yeschenko added a comment - Still not a full review, just things that don't work: ALTER TABLE ADD should support adding a static column, but doesn't dropping a static column doesn't work fully (it won't be compacted away)
          Hide
          Sylvain Lebresne added a comment -

          Pushed rebased and generally updated version at https://github.com/pcmanus/cassandra/commits/6561-3.

          One important change of this version is that I realized that the "trick" of reusing the end-of-component in a composite was not working, messing with the end-of-component was problematic for slices and some edge cases were not working properly. The new version uses a somewhat more direct approach of prefixing the name of static columns by 0xFFFF, and CompositeType checks for that. The reason for that value is that it was actually not possible for an existing composite name to have a first component of 65535 bytes, since the full cell name is limited to that size and a composite name has at least 3 bytes more that it's component (even if it has only one of them).

          Another point that is worth noting is that the new patch now properly handles any mix of conditions in batches (as long as it's only on one partition), include IF NOT EXISTS ones (which was not the case of the previous version).

          I'd prefer new ColumnDefinition.Kind.STATIC

          I switched to that with the last commit on the branch above. I kind of agree that it's probably a little cleaner, though it's worth noting that it's not necessarily simpler because that kind of mean separating regular and static columns while there is many cases where they are treated the same way. Anyway, not a huge deal either. I also ended-up renaming a bunch of stuffs in CFDefinition to keep things sane. Admittingly not all the renames were absolutely necessary for this patch but well, they make things cleaner.

          Haven't fixed cqlsh describe yet as cqlsh is not the code I know the best but I can have a look at it a

          Show
          Sylvain Lebresne added a comment - Pushed rebased and generally updated version at https://github.com/pcmanus/cassandra/commits/6561-3 . One important change of this version is that I realized that the "trick" of reusing the end-of-component in a composite was not working, messing with the end-of-component was problematic for slices and some edge cases were not working properly. The new version uses a somewhat more direct approach of prefixing the name of static columns by 0xFFFF, and CompositeType checks for that. The reason for that value is that it was actually not possible for an existing composite name to have a first component of 65535 bytes, since the full cell name is limited to that size and a composite name has at least 3 bytes more that it's component (even if it has only one of them). Another point that is worth noting is that the new patch now properly handles any mix of conditions in batches (as long as it's only on one partition), include IF NOT EXISTS ones (which was not the case of the previous version). I'd prefer new ColumnDefinition.Kind.STATIC I switched to that with the last commit on the branch above. I kind of agree that it's probably a little cleaner, though it's worth noting that it's not necessarily simpler because that kind of mean separating regular and static columns while there is many cases where they are treated the same way. Anyway, not a huge deal either. I also ended-up renaming a bunch of stuffs in CFDefinition to keep things sane. Admittingly not all the renames were absolutely necessary for this patch but well, they make things cleaner. Haven't fixed cqlsh describe yet as cqlsh is not the code I know the best but I can have a look at it a
          Hide
          Aleksey Yeschenko added a comment -

          Nicolas Favre-Felix Oh, makes sense. As I said, I only skimmed it. Never mind that point then. Still think ColumnDefinition.Kind.STATIC should be introduced though.

          Show
          Aleksey Yeschenko added a comment - Nicolas Favre-Felix Oh, makes sense. As I said, I only skimmed it. Never mind that point then. Still think ColumnDefinition.Kind.STATIC should be introduced though.
          Hide
          Nicolas Favre-Felix added a comment -

          Aleksey Yeschenko: the patch above enables CAS on several cells within a single partition too, in addition to supporting CAS on a static column.
          For example:

          BEGIN BATCH
              UPDATE transactions SET paid='Y' WHERE user='foo' AND txid='123' IF tx_token='abc';
              UPDATE transactions SET paid='Y' WHERE user='foo' AND txid='456' IF tx_token='def';
              UPDATE transactions SET paid='Y' WHERE user='foo' AND txid='789' IF tx_token='ghi';
          APPLY BATCH;
          

          This is already something we can do with Thrift and composite columns; it is very useful to CAS-update several CQL rows within the same partition.

          Sylvain's branch introduces support for both multi-CQL-row CAS in a partition with any number of cells to check as well as support for a static column within a partition. Both are important, imho.

          Show
          Nicolas Favre-Felix added a comment - Aleksey Yeschenko : the patch above enables CAS on several cells within a single partition too, in addition to supporting CAS on a static column. For example: BEGIN BATCH UPDATE transactions SET paid='Y' WHERE user='foo' AND txid='123' IF tx_token='abc'; UPDATE transactions SET paid='Y' WHERE user='foo' AND txid='456' IF tx_token='def'; UPDATE transactions SET paid='Y' WHERE user='foo' AND txid='789' IF tx_token='ghi'; APPLY BATCH; This is already something we can do with Thrift and composite columns; it is very useful to CAS-update several CQL rows within the same partition. Sylvain's branch introduces support for both multi-CQL-row CAS in a partition with any number of cells to check as well as support for a static column within a partition. Both are important, imho.
          Hide
          Aleksey Yeschenko added a comment -

          I think I'd prefer

          BEGIN BATCH
              UPDATE test SET v='foobar' WHERE id=0 AND k='k1';
              UPDATE test SET v='barfoo' WHERE id=0 AND k='k2';
              UPDATE test SET version=3 WHERE id=0;
          APPLY BATCH IF version=1;
          

          to

          BEGIN BATCH
              UPDATE test SET v='foobar' WHERE id=0 AND k='k1';
              UPDATE test SET v='barfoo' WHERE id=0 AND k='k2';
              UPDATE test SET version=3 WHERE id=0 IF version=1;
          APPLY BATCH
          

          To make it clear that only one condition applies to the whole modified partition. Maybe even make it BEGIN CAS BATCH, for explicitness sake, even though we can infer it from IF. (moving IF outside also makes it easier to support IF NOT EXISTS properly).

          Show
          Aleksey Yeschenko added a comment - I think I'd prefer BEGIN BATCH UPDATE test SET v='foobar' WHERE id=0 AND k='k1'; UPDATE test SET v='barfoo' WHERE id=0 AND k='k2'; UPDATE test SET version=3 WHERE id=0; APPLY BATCH IF version=1; to BEGIN BATCH UPDATE test SET v='foobar' WHERE id=0 AND k='k1'; UPDATE test SET v='barfoo' WHERE id=0 AND k='k2'; UPDATE test SET version=3 WHERE id=0 IF version=1; APPLY BATCH To make it clear that only one condition applies to the whole modified partition. Maybe even make it BEGIN CAS BATCH, for explicitness sake, even though we can infer it from IF. (moving IF outside also makes it easier to support IF NOT EXISTS properly).
          Hide
          Aleksey Yeschenko added a comment -

          Only just skimmed it a little. I don't think is_static boolean is the way to go - I'd prefer new ColumnDefinition.Kind.STATIC here.

          Also not sure about the BATCH syntax here (but don't have a better solution in mind, yet).

          Show
          Aleksey Yeschenko added a comment - Only just skimmed it a little. I don't think is_static boolean is the way to go - I'd prefer new ColumnDefinition.Kind.STATIC here. Also not sure about the BATCH syntax here (but don't have a better solution in mind, yet).
          Hide
          Aleksey Yeschenko added a comment -

          Sylvain Lebresne now that 6623 is +1d, can you commit it and rebase 6561 before I review it? I'll go review CASSANDRA-4851 for now.

          Show
          Aleksey Yeschenko added a comment - Sylvain Lebresne now that 6623 is +1d, can you commit it and rebase 6561 before I review it? I'll go review CASSANDRA-4851 for now.
          Hide
          Nicolas Favre-Felix added a comment -

          Thanks Sylvain. One more thing: it seems that the "static" suffix is not currently added to the column definition printed by "DESCRIBE TABLE foo;"

          Show
          Nicolas Favre-Felix added a comment - Thanks Sylvain. One more thing: it seems that the "static" suffix is not currently added to the column definition printed by "DESCRIBE TABLE foo;"
          Hide
          Sylvain Lebresne added a comment -

          I'll note that regarding batching CAS, the branch above does not handle batching 'INSERT IF NOT EXISTS', only 'UPDATE IF'. We should lift that limitation but we need the refactoring from CASSANDRA-6623's patch for that, so I'll wait for the latter to be committed to rebase everything and add that last bit. Probably not worth holding on the review for that though

          Show
          Sylvain Lebresne added a comment - I'll note that regarding batching CAS, the branch above does not handle batching 'INSERT IF NOT EXISTS', only 'UPDATE IF'. We should lift that limitation but we need the refactoring from CASSANDRA-6623 's patch for that, so I'll wait for the latter to be committed to rebase everything and add that last bit. Probably not worth holding on the review for that though
          Hide
          Sylvain Lebresne added a comment -

          Concerning CAS, the previous patches were covering the ability to serialize updates to a given partition (through CASing a static columns), but to be a full alternative to CASSANDRA-5633 we still need to allow batching with conditions. The good news is that we don't need extra syntax for that, we just need to allow IF in batches which is relatively natural (of course, we need to limit batch with conditions to be only on one partition since we don't support cross-partition CAS).

          I've pushed a rebased version of the branch above with an additional patch to handle batching at https://github.com/pcmanus/cassandra/commits/6561-2. I've also updated the dtests, so examples of how it looks like is at https://github.com/riptano/cassandra-dtest/commit/dee0d8f6bcf4816cc0690b001f875929fd69dfb4.

          Show
          Sylvain Lebresne added a comment - Concerning CAS, the previous patches were covering the ability to serialize updates to a given partition (through CASing a static columns), but to be a full alternative to CASSANDRA-5633 we still need to allow batching with conditions. The good news is that we don't need extra syntax for that, we just need to allow IF in batches which is relatively natural (of course, we need to limit batch with conditions to be only on one partition since we don't support cross-partition CAS). I've pushed a rebased version of the branch above with an additional patch to handle batching at https://github.com/pcmanus/cassandra/commits/6561-2 . I've also updated the dtests, so examples of how it looks like is at https://github.com/riptano/cassandra-dtest/commit/dee0d8f6bcf4816cc0690b001f875929fd69dfb4 .
          Hide
          Brandon Williams added a comment -

          Given that CASSANDRA-5633 is something doable in thrift, but not CQL, I'm +1 on putting this in 2.0.

          Show
          Brandon Williams added a comment - Given that CASSANDRA-5633 is something doable in thrift, but not CQL, I'm +1 on putting this in 2.0.
          Hide
          Sylvain Lebresne added a comment -

          To quantify my "I think we can support this relatively easily", I've pushed an initial version of this at https://github.com/pcmanus/cassandra/commits/6561.

          I'm calling this an initial version because there is still a few things that I need to test and it probably lacks a few validations here and there, but I believe this support pretty much the majority of what we'd want here (that is, a final version might have a few more lines, but in term of complexity that's basically it). I've also pushed 2 dtests at https://github.com/riptano/cassandra-dtest/commit/21fb90e03edcb452feb98027a0272b47de9efb07 that demonstrate the basic functionalities and give a more concrete idea of how this actually work API wise. I think it's fair to say that while not trivial, this patch is really not extremely complicated.

          I'll note that the patch is against 2.0. That doesn't mean I'm strongly set with pushing that for 2.0, I'm not. But we did kind of said that we'd try to support CASSANDRA-5633 use case in 2.0, and since that's what might probably replace it, I wanted to get a feel of how much code change this would imply for 2.0. If we decide 2.0 is out of question, I'll gladly rebase against trunk.

          Show
          Sylvain Lebresne added a comment - To quantify my "I think we can support this relatively easily", I've pushed an initial version of this at https://github.com/pcmanus/cassandra/commits/6561 . I'm calling this an initial version because there is still a few things that I need to test and it probably lacks a few validations here and there, but I believe this support pretty much the majority of what we'd want here (that is, a final version might have a few more lines, but in term of complexity that's basically it). I've also pushed 2 dtests at https://github.com/riptano/cassandra-dtest/commit/21fb90e03edcb452feb98027a0272b47de9efb07 that demonstrate the basic functionalities and give a more concrete idea of how this actually work API wise. I think it's fair to say that while not trivial, this patch is really not extremely complicated. I'll note that the patch is against 2.0. That doesn't mean I'm strongly set with pushing that for 2.0, I'm not. But we did kind of said that we'd try to support CASSANDRA-5633 use case in 2.0, and since that's what might probably replace it, I wanted to get a feel of how much code change this would imply for 2.0. If we decide 2.0 is out of question, I'll gladly rebase against trunk.

            People

            • Assignee:
              Sylvain Lebresne
              Reporter:
              Sylvain Lebresne
              Reviewer:
              Aleksey Yeschenko
            • Votes:
              1 Vote for this issue
              Watchers:
              15 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development