Cassandra
  1. Cassandra
  2. CASSANDRA-3647

Support collection (list, set, and map) value types in CQL

    Details

    • Type: New Feature New Feature
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Fix Version/s: 1.2.0 beta 1
    • Component/s: API, Core
    • Labels:

      Description

      Composite columns introduce the ability to have arbitrarily nested data in a Cassandra row. We should expose this through CQL.

      1. 3647-nit.txt
        0.9 kB
        paul cannon
      2. CASSANDRA-3647-alternative.patch
        77 kB
        Pavel Yaskevich

        Issue Links

          Activity

          Hide
          Jonathan Ellis added a comment -

          Ed's AnyType is useful here, since you really need "self describing data" for documents: CASSANDRA-3281

          Show
          Jonathan Ellis added a comment - Ed's AnyType is useful here, since you really need "self describing data" for documents: CASSANDRA-3281
          Hide
          Jonathan Ellis added a comment -

          Sylvain suggests we start with (non-nested) lists, maps, and sets. I agree that this is a great 80/20 approach to the problem (and does not prevent us from generalizing further in the future).

          Show
          Jonathan Ellis added a comment - Sylvain suggests we start with (non-nested) lists, maps, and sets. I agree that this is a great 80/20 approach to the problem (and does not prevent us from generalizing further in the future).
          Hide
          Rick Branson added a comment -

          I hate to gold plate here, but this feature is of marginal utility and could be implemented on-top of a UDF facility, similar to how PostgreSQL supports XML and (in 9.2) JSON formats.

          Show
          Rick Branson added a comment - I hate to gold plate here, but this feature is of marginal utility and could be implemented on-top of a UDF facility, similar to how PostgreSQL supports XML and (in 9.2) JSON formats.
          Hide
          Jonathan Ellis added a comment -

          If by "this feature" you mean the full document support in the title, you could be right.

          If you mean the list and map types I propopsed, I disagree on both counts. It's very useful for us to be able to denormalize maps or lists into a field, instead of doing a client-side join. The reason I want this fully supported instead of just a glorified blob is to do efficient updates (appends/pops for lists, insert/deletes for maps).

          My quick googling suggests that postgresql gives you array_to_json and row_to_json, which is fine as far as it goes but doesn't accomplish the above. (Note that with their array type, pg comes close to what I have in mind for the list type here.)

          If it would be less confusing to move "add map and list types" to a separate ticket, that's fine.

          Show
          Jonathan Ellis added a comment - If by "this feature" you mean the full document support in the title, you could be right. If you mean the list and map types I propopsed, I disagree on both counts. It's very useful for us to be able to denormalize maps or lists into a field, instead of doing a client-side join. The reason I want this fully supported instead of just a glorified blob is to do efficient updates (appends/pops for lists, insert/deletes for maps). My quick googling suggests that postgresql gives you array_to_json and row_to_json, which is fine as far as it goes but doesn't accomplish the above. (Note that with their array type, pg comes close to what I have in mind for the list type here.) If it would be less confusing to move "add map and list types" to a separate ticket, that's fine.
          Hide
          Jeremiah Jordan added a comment -

          For the maps/lists/sets, are you talking about a standard way of using composite columns to implement them, or, a new column type that has a map/list/set inside one column, and new operations to work with them? Similar to counters?

          I think this Issue started about having a standard way for CQL to store document attributes in multiple columns using multiple columns and composites with slices to get/set them.

          So if you want to make a standard way of setting up composite columns for maps/lists/sets, I think this issue can be hi-jacked for that. If you want to add a new type of column that supports redis like map/set/list operations, I would make a new issue.

          Show
          Jeremiah Jordan added a comment - For the maps/lists/sets, are you talking about a standard way of using composite columns to implement them, or, a new column type that has a map/list/set inside one column, and new operations to work with them? Similar to counters? I think this Issue started about having a standard way for CQL to store document attributes in multiple columns using multiple columns and composites with slices to get/set them. So if you want to make a standard way of setting up composite columns for maps/lists/sets, I think this issue can be hi-jacked for that. If you want to add a new type of column that supports redis like map/set/list operations, I would make a new issue.
          Hide
          Jonathan Ellis added a comment -

          So if you want to make a standard way of setting up composite columns for maps/lists/sets, I think this issue can be hi-jacked for that. If you want to add a new type of column that supports redis like map/set/list operations, I would make a new issue.

          Well, it's both. Because we do want the latter, but implemented as the former. I suppose it's reasonable to split out CQL operations (list append/pop, map get/put/remove, set add/remove) to another ticket.

          So first, we're going to need to support heterogeneous comparators somehow. Consider this table declaration:

          CREATE TABLE foo (
              id int PRIMARY KEY,
              field1 text,
              field2 map<int, text>,
              field3 list<text>
          );
          

          The Cassandra CF containing these rows will contain single-level columns (field1), CT(ascii, int) (field2), and CT(ascii, uuid) (field3), assuming that we represent lists with v1 uuid column names, which seems like the best option to me.

          CASSANDRA-3657 gets us part of the way there (all CF column names will have the same prefix, which is the CQL column name) but not all the way.

          Show
          Jonathan Ellis added a comment - So if you want to make a standard way of setting up composite columns for maps/lists/sets, I think this issue can be hi-jacked for that. If you want to add a new type of column that supports redis like map/set/list operations, I would make a new issue. Well, it's both. Because we do want the latter, but implemented as the former. I suppose it's reasonable to split out CQL operations (list append/pop, map get/put/remove, set add/remove) to another ticket. So first, we're going to need to support heterogeneous comparators somehow. Consider this table declaration: CREATE TABLE foo ( id int PRIMARY KEY, field1 text, field2 map< int , text>, field3 list<text> ); The Cassandra CF containing these rows will contain single-level columns ( field1 ), CT(ascii, int) ( field2 ), and CT(ascii, uuid) ( field3 ), assuming that we represent lists with v1 uuid column names, which seems like the best option to me. CASSANDRA-3657 gets us part of the way there (all CF column names will have the same prefix, which is the CQL column name) but not all the way.
          Hide
          Jonathan Ellis added a comment -

          The alternative is we say "all maps, lists, and sets will be represented as CT(ascii, UTF8)" and json-encode things. On the one hand, that's horribly inefficient space-wise for ints and uuids. On the other hand, json support is high on the list of motivations here, and that would simplify things a lot. (And if we used a library to represent utf8 as byte[] instead of String, that would mitigate the overhead of representing as json by roughly 50% for common usage.)

          Show
          Jonathan Ellis added a comment - The alternative is we say "all maps, lists, and sets will be represented as CT(ascii, UTF8)" and json-encode things. On the one hand, that's horribly inefficient space-wise for ints and uuids. On the other hand, json support is high on the list of motivations here, and that would simplify things a lot. (And if we used a library to represent utf8 as byte[] instead of String, that would mitigate the overhead of representing as json by roughly 50% for common usage.)
          Hide
          Jonathan Ellis added a comment -

          a library to represent utf8 as byte[] instead of String

          e.g., https://github.com/jruby/bytelist/blob/master/src/org/jruby/util/ByteList.java

          Show
          Jonathan Ellis added a comment - a library to represent utf8 as byte[] instead of String e.g., https://github.com/jruby/bytelist/blob/master/src/org/jruby/util/ByteList.java
          Hide
          Sylvain Lebresne added a comment -

          The Cassandra CF containing these rows will contain single-level columns (field1), CT(ascii, int) (field2), and CT(ascii, uuid) (field3), assuming that we represent lists with v1 uuid column names, which seems like the best option to me.

          One option would be to write a separate, specific, comparator that would basically be 'nothing or map or list or whateverelsewelladd'. So typically the comparator for exemple above would be: CompositeType(UTF8Type, NewShinyComparator). This avoids to mess with the composite type itself.

          Going further, we could very well have a UnionType comparator that takes argument and allow a component to be one of different comparator (it could use the same kind of idea than DynamicCompositeType to differenciate between different elements). So basically we would have: CompositeType(UTF8Type, UnionType(MapType(ÅsciiType), ListType(AsciiType))).

          I quite like that idea of composition of comparators. Typically CompositeType with UnionType would be equivalent to DynamicCompositeType but with even more flexibility (as you would be able to 'fix' some of the component, which is useful (especially for us in CQL3)).

          Show
          Sylvain Lebresne added a comment - The Cassandra CF containing these rows will contain single-level columns (field1), CT(ascii, int) (field2), and CT(ascii, uuid) (field3), assuming that we represent lists with v1 uuid column names, which seems like the best option to me. One option would be to write a separate, specific, comparator that would basically be 'nothing or map or list or whateverelsewelladd'. So typically the comparator for exemple above would be: CompositeType(UTF8Type, NewShinyComparator). This avoids to mess with the composite type itself. Going further, we could very well have a UnionType comparator that takes argument and allow a component to be one of different comparator (it could use the same kind of idea than DynamicCompositeType to differenciate between different elements). So basically we would have: CompositeType(UTF8Type, UnionType(MapType(ÅsciiType), ListType(AsciiType))). I quite like that idea of composition of comparators. Typically CompositeType with UnionType would be equivalent to DynamicCompositeType but with even more flexibility (as you would be able to 'fix' some of the component, which is useful (especially for us in CQL3)).
          Hide
          Ben McCann added a comment -

          I started using ElasticSearch to get massively scalable storage of nested JSON documents. They accept smile (https://github.com/FasterXML/jackson-dataformat-smile) as well, which I've found to be very nice. It's a binary version of JSON, so it's much faster to encode/decode and more space efficient.

          Show
          Ben McCann added a comment - I started using ElasticSearch to get massively scalable storage of nested JSON documents. They accept smile ( https://github.com/FasterXML/jackson-dataformat-smile ) as well, which I've found to be very nice. It's a binary version of JSON, so it's much faster to encode/decode and more space efficient.
          Hide
          Jonathan Ellis added a comment - - edited

          we could very well have a UnionType comparator that takes argument and allow a component to be one of different comparator

          Maybe. Note that we never actually need to compare the different types, since sub-components of types X and Y will always have a different parent component. We just need to allow them.

          Also, whether the field is a map/list/set is irrelevant for the purposes of the storage engine. (All the operations I propose can be done as a single CT insert operation, without read-before-write. Except for pop, which I didn't think through and I withdraw.) So not sure whether representing that as part of the Comparator is the right thing to do. That is, QueryProcessor will need to know that some columns should be bundled together as a Map, but ColumnFamilyStore and beneath won't care.

          Show
          Jonathan Ellis added a comment - - edited we could very well have a UnionType comparator that takes argument and allow a component to be one of different comparator Maybe. Note that we never actually need to compare the different types, since sub-components of types X and Y will always have a different parent component. We just need to allow them. Also, whether the field is a map/list/set is irrelevant for the purposes of the storage engine. (All the operations I propose can be done as a single CT insert operation, without read-before-write. Except for pop, which I didn't think through and I withdraw.) So not sure whether representing that as part of the Comparator is the right thing to do. That is, QueryProcessor will need to know that some columns should be bundled together as a Map, but ColumnFamilyStore and beneath won't care.
          Hide
          Jeremiah Jordan added a comment -

          Except for pop, which I didn't think through and I withdraw.

          The list ops I think would be important are, being able to page through X at a time (starting from the front or back), push front, push back, and remove.

          Show
          Jeremiah Jordan added a comment - Except for pop, which I didn't think through and I withdraw. The list ops I think would be important are, being able to page through X at a time (starting from the front or back), push front, push back, and remove.
          Hide
          Jonathan Ellis added a comment -

          Those are good ideas, but I'm going to be the bad guy in the name of keeping this in-scope for 1.2.

          The crucial idea here is to provide nested collections for convenience; if it's so big you need to page it, it probably belongs in a separate row. So paging is not on my short list to start with.

          I also chose the word "list" over "deque" because I cannot think of a way to provide push-front efficiently (i.e., without read-before-write, and without update-all-existing-list-items). As I mentioned in passing, we can provide append efficiently by using v1 uuids as column names (and translating to list indexes in QueryProcessor), but that doesn't give us anything else for "free." Open to suggestions if you have a better design in mind, of course.

          Show
          Jonathan Ellis added a comment - Those are good ideas, but I'm going to be the bad guy in the name of keeping this in-scope for 1.2. The crucial idea here is to provide nested collections for convenience; if it's so big you need to page it, it probably belongs in a separate row. So paging is not on my short list to start with. I also chose the word "list" over "deque" because I cannot think of a way to provide push-front efficiently (i.e., without read-before-write, and without update-all-existing-list-items). As I mentioned in passing, we can provide append efficiently by using v1 uuids as column names (and translating to list indexes in QueryProcessor), but that doesn't give us anything else for "free." Open to suggestions if you have a better design in mind, of course.
          Hide
          Jeremiah Jordan added a comment -

          Can't you just use negative timestamps? Or is the comparator doing an unsigned compare?

          Show
          Jeremiah Jordan added a comment - Can't you just use negative timestamps? Or is the comparator doing an unsigned compare?
          Hide
          Jonathan Ellis added a comment -

          Well, uuids are better than timestamps since that means you won't lose data if two clients update at the same time. But we can pretend we're using timestamps for simplicity here. Suppose that we're implementing field3.pushbefore(), then. What should the negative timestamp be?

          Show
          Jonathan Ellis added a comment - Well, uuids are better than timestamps since that means you won't lose data if two clients update at the same time. But we can pretend we're using timestamps for simplicity here. Suppose that we're implementing field3.pushbefore(), then. What should the negative timestamp be?
          Hide
          Jeremiah Jordan added a comment -

          I meant setting the timestamp portion of the UUID to the negative of the current time. The timestamp portion of the UUID is how you are getting append/push back, correct?

          field3.pushback(a) -> field3:1 : a
          field3.pushback(b) -> field3:2 : b
          field3.pushfront(c) -> field3:-3 : c
          field3.pushfront(d) -> field3:-4 : d
          field3.pushback(e) -> field3:5 : e
          Gives you:
          (-4, d), (-3, c), (1, a), (2, b), (5, e)

          Show
          Jeremiah Jordan added a comment - I meant setting the timestamp portion of the UUID to the negative of the current time. The timestamp portion of the UUID is how you are getting append/push back, correct? field3.pushback(a) -> field3:1 : a field3.pushback(b) -> field3:2 : b field3.pushfront(c) -> field3:-3 : c field3.pushfront(d) -> field3:-4 : d field3.pushback(e) -> field3:5 : e Gives you: (-4, d), (-3, c), (1, a), (2, b), (5, e)
          Hide
          Jonathan Ellis added a comment -

          Ah, good idea. That should actually work fine. For UUIDs we'd need to pick some point in time as "zero" and then subtract from that, since there is no such thing as a negative time in that context, but that's a minor wrinkle and doesn't matter for the purposes of the list, since the uuid contents are an implementation detail as far as the user is concerned.

          Show
          Jonathan Ellis added a comment - Ah, good idea. That should actually work fine. For UUIDs we'd need to pick some point in time as "zero" and then subtract from that, since there is no such thing as a negative time in that context, but that's a minor wrinkle and doesn't matter for the purposes of the list, since the uuid contents are an implementation detail as far as the user is concerned.
          Hide
          Sylvain Lebresne added a comment -

          Note that we never actually need to compare the different types, since sub-components of types X and Y will always have a different parent component. We just need to allow them

          True, but I guess that was the UnionType part of it, whose goal is to allow that. For MapType and ListType, I agree they don't add much in term of comparison, but I still think it may be a good idea to declare them, even if they are just alias (like ListType would be an alias for UUIDType), just so that map-reduce and other external tool know that we meant a list, not a uuid, by just looking at the comparator (could be handy for debugging too). But I'm probably getting carried away, those are implementation details and may or may not be feasible/desireable. My point was that as far as supporting internally whatever encoding we chose, adding a UnionType would be an option (one I like more than a json encoding personally), and that's fairly trivial to write.

          At the more general level, I'm wondering how to return that to the client, in the ResultSet. It's unclear to me how to support even just map and list in there in a useful way. Of course there is the approach of returning a json string, which I'll admit I'm not a super fan because 1) it's only convenient for those that wants json and 2) it feels at odd with our current API that is not json at all.

          Show
          Sylvain Lebresne added a comment - Note that we never actually need to compare the different types, since sub-components of types X and Y will always have a different parent component. We just need to allow them True, but I guess that was the UnionType part of it, whose goal is to allow that. For MapType and ListType, I agree they don't add much in term of comparison, but I still think it may be a good idea to declare them, even if they are just alias (like ListType would be an alias for UUIDType), just so that map-reduce and other external tool know that we meant a list, not a uuid, by just looking at the comparator (could be handy for debugging too). But I'm probably getting carried away, those are implementation details and may or may not be feasible/desireable. My point was that as far as supporting internally whatever encoding we chose, adding a UnionType would be an option (one I like more than a json encoding personally), and that's fairly trivial to write. At the more general level, I'm wondering how to return that to the client, in the ResultSet. It's unclear to me how to support even just map and list in there in a useful way. Of course there is the approach of returning a json string, which I'll admit I'm not a super fan because 1) it's only convenient for those that wants json and 2) it feels at odd with our current API that is not json at all.
          Hide
          Jonathan Ellis added a comment -

          Well, the ResultSet isn't written in stone itself; it's always been kind of a placeholder pending CASSANDRA-2478. Our custom transport could represent the entire resultset in json (or Smile) if we want, which is the approach unql appears to take: http://www.unqlspec.org/display/UnQL/Example+Queries+and+Usage

          For the existing Thrift transport though I'm not super concerned about it, as long as we come up with something halfway reasonable (which json-encoding qualifies as), I'm okay with it.

          Alternatively, we could use a more compact, custom format leveraging the fact that we know the types involved (and thus don't need to implicitly encode those in an inefficient representation), e.g. for Map number of entries followed by key/value pairs in native binary format.

          Show
          Jonathan Ellis added a comment - Well, the ResultSet isn't written in stone itself; it's always been kind of a placeholder pending CASSANDRA-2478 . Our custom transport could represent the entire resultset in json (or Smile) if we want, which is the approach unql appears to take: http://www.unqlspec.org/display/UnQL/Example+Queries+and+Usage For the existing Thrift transport though I'm not super concerned about it, as long as we come up with something halfway reasonable (which json-encoding qualifies as), I'm okay with it. Alternatively, we could use a more compact, custom format leveraging the fact that we know the types involved (and thus don't need to implicitly encode those in an inefficient representation), e.g. for Map number of entries followed by key/value pairs in native binary format.
          Hide
          Sylvain Lebresne added a comment -

          Yeah I wasn't really thinking straight. I was not worried about how to do it technically, but more about how well drivers would be able to expose that. But handling lists and maps should feel natural in pretty much any language. So don't mind me.

          Show
          Sylvain Lebresne added a comment - Yeah I wasn't really thinking straight. I was not worried about how to do it technically, but more about how well drivers would be able to expose that. But handling lists and maps should feel natural in pretty much any language. So don't mind me.
          Hide
          T Jake Luciani added a comment -

          Also, hive supports complex types we could model this after... https://cwiki.apache.org/confluence/display/Hive/Tutorial#Tutorial-ComplexTypes

          Show
          T Jake Luciani added a comment - Also, hive supports complex types we could model this after... https://cwiki.apache.org/confluence/display/Hive/Tutorial#Tutorial-ComplexTypes
          Hide
          Jonathan Ellis added a comment -

          Editing title to reflect reduced scope.

          Not sure that we can really support lists meaningfully. [Sorted] sets is a much clearer mapping to C*, i.e., people don't expect to be able to write "set[3] = foo" and have some magic index -> column name mapping happen.

          Also, another source of prior art is http://www.postgresql.org/docs/9.1/static/hstore.html. I like the use of the || operator for concatenation, but I'm not sure why they invented this whacky one-off syntax for maps instead of using json. Is there a good reason I don't see why json conflicts with SQL? Or did the person who wrote it just like the old Ruby hash syntax (which, I note that Ruby deprecated because it's so cumbersome)?

          Show
          Jonathan Ellis added a comment - Editing title to reflect reduced scope. Not sure that we can really support lists meaningfully. [Sorted] sets is a much clearer mapping to C*, i.e., people don't expect to be able to write "set [3] = foo" and have some magic index -> column name mapping happen. Also, another source of prior art is http://www.postgresql.org/docs/9.1/static/hstore.html . I like the use of the || operator for concatenation, but I'm not sure why they invented this whacky one-off syntax for maps instead of using json. Is there a good reason I don't see why json conflicts with SQL? Or did the person who wrote it just like the old Ruby hash syntax (which, I note that Ruby deprecated because it's so cumbersome)?
          Hide
          Jonathan Ellis added a comment -

          Not sure that we can really support lists meaningfully

          I think I'm leaning back towards supporting lists again. While we can't do random-access against lists, none of the document databases out there do either; fetching L[3] fetches all of L, and updating L[3] = foo rewrites all of L.

          And having syntactic sugar for a Map whose keys are timeuuids is just very useful syntactic sugar.

          So if we make this explicit – that the operations allowed are simply appending, fetching the entire list, and overwriting the entire list (which we can do by writing a composite tombstone, plus the new list items) then I think we're in good shape.

          Show
          Jonathan Ellis added a comment - Not sure that we can really support lists meaningfully I think I'm leaning back towards supporting lists again. While we can't do random-access against lists, none of the document databases out there do either; fetching L [3] fetches all of L, and updating L [3] = foo rewrites all of L. And having syntactic sugar for a Map whose keys are timeuuids is just very useful syntactic sugar. So if we make this explicit – that the operations allowed are simply appending, fetching the entire list, and overwriting the entire list (which we can do by writing a composite tombstone, plus the new list items) then I think we're in good shape.
          Hide
          Jonathan Ellis added a comment - - edited

          One possible syntax, allowing append/add/put for list/set/map and full overwrites. For set and map we can additionally support removal. (I've called this operation discard after the Python Set method for "remove an item if it exists, otherwise do not complain," but alternatives are reasonable.)

          CREATE TABLE foo(
            k uuid PRIMARY KEY,
            L list<int>,
            M map<text, int>,
            S set<int>
          );
          
          UPDATE foo SET L = L.append(1) WHERE k = 'b017f48f-ae67-11e1-9096-005056c00008';
          UPDATE foo SET L = '[2, 3]' WHERE ... ; -- do we need/want to require quoting?
          UPDATE foo SET S = S.add(1) WHERE ... ;
          UPDATE foo SET S = '{2, 3}' WHERE ... ; -- JSON does not define a set type or syntax.  this is Python's syntax for set literals
          UPDATE foo SET S = S.discard(2) WHERE ... ;
          UPDATE foo SET M = M.put('ocd', 1) WHERE ... ;
          UPDATE foo SET M = '{"cod": 2, "dog": 3}' WHERE ... ; -- note double quotes forced if we require quoting the map literal
          UPDATE foo SET M = M.discard('cod') WHERE ... ;
          
          Show
          Jonathan Ellis added a comment - - edited One possible syntax, allowing append/add/put for list/set/map and full overwrites. For set and map we can additionally support removal. (I've called this operation discard after the Python Set method for "remove an item if it exists, otherwise do not complain," but alternatives are reasonable.) CREATE TABLE foo( k uuid PRIMARY KEY, L list< int >, M map<text, int >, S set< int > ); UPDATE foo SET L = L.append(1) WHERE k = 'b017f48f-ae67-11e1-9096-005056c00008'; UPDATE foo SET L = '[2, 3]' WHERE ... ; -- do we need/want to require quoting? UPDATE foo SET S = S.add(1) WHERE ... ; UPDATE foo SET S = '{2, 3}' WHERE ... ; -- JSON does not define a set type or syntax. this is Python's syntax for set literals UPDATE foo SET S = S.discard(2) WHERE ... ; UPDATE foo SET M = M.put('ocd', 1) WHERE ... ; UPDATE foo SET M = '{ "cod" : 2, "dog" : 3}' WHERE ... ; -- note double quotes forced if we require quoting the map literal UPDATE foo SET M = M.discard('cod') WHERE ... ;
          Hide
          Jonathan Ellis added a comment -

          This will allow ripping out the clunky pseudo-map support we have for strategy_options et al.

          Show
          Jonathan Ellis added a comment - This will allow ripping out the clunky pseudo-map support we have for strategy_options et al.
          Hide
          Sylvain Lebresne added a comment -

          I will have a closer look, but I think we can avoid having to quote the literals (it doesn't conflict with any of our current literals), which will look cleaner.

          Show
          Sylvain Lebresne added a comment - I will have a closer look, but I think we can avoid having to quote the literals (it doesn't conflict with any of our current literals), which will look cleaner.
          Hide
          Sylvain Lebresne added a comment -

          Pushed at https://github.com/pcmanus/cassandra/commits/3647 patches for this. This pretty much implement the syntax of Jonathan's comment above, except that literals must not be quoted. Supported methods are:

          • maps: put and discard.
          • sets: add, add_all (takes n arguments) and discard.
          • lits: append, append_all, prepend, prepend_all, set (to set the value at a specific index), discard (discard an object given its value) and discard_idx (discard given the index).

          The implementation for maps and set is rather straighforward. Lists are implemented as a map of TimeUUID -> value. Prepend is supported by the trick described in the previous comments of picking a reference time (1 january 2010 as it turns out) and using that reference minus now as a timestamp. The list method set, discard and discard_idx share in common that they do a read-before-write (we read the whole list to find the TimeUUID corresponding to the element to update; we then write that element (I don't think we need to overwrite the entire list at all)).

          Another detail worth mentioning is literals. When setting a literal we must erase the previous list. To do so, the implementation insert a (range) tombstone at 'timestamp - 1'. This have the unfortunate consequence that if you do 2 update with a literal in the same millisecond, you won't end up with one or the other, but rather as a merge of both literals. I don't see a good way how to do otherwise.

          As of these patches, collections (lists/maps/sets) are limited to table definition that use a composed PRIMARY KEY. The reason is that otherwise the underlying comparator is not a composite one, and thus there is no way to support the collection. I very strongly think that we should change this behavior and I've opened CASSANDRA-4329 for that. But besides this limitation, these patches are ready for review and I've pushed a few tests those in dtests (in cql_tests.py).

          Show
          Sylvain Lebresne added a comment - Pushed at https://github.com/pcmanus/cassandra/commits/3647 patches for this. This pretty much implement the syntax of Jonathan's comment above, except that literals must not be quoted. Supported methods are: maps: put and discard. sets: add, add_all (takes n arguments) and discard. lits: append, append_all, prepend, prepend_all, set (to set the value at a specific index), discard (discard an object given its value) and discard_idx (discard given the index). The implementation for maps and set is rather straighforward. Lists are implemented as a map of TimeUUID -> value. Prepend is supported by the trick described in the previous comments of picking a reference time (1 january 2010 as it turns out) and using that reference minus now as a timestamp. The list method set, discard and discard_idx share in common that they do a read-before-write (we read the whole list to find the TimeUUID corresponding to the element to update; we then write that element (I don't think we need to overwrite the entire list at all)). Another detail worth mentioning is literals. When setting a literal we must erase the previous list. To do so, the implementation insert a (range) tombstone at 'timestamp - 1'. This have the unfortunate consequence that if you do 2 update with a literal in the same millisecond, you won't end up with one or the other, but rather as a merge of both literals. I don't see a good way how to do otherwise. As of these patches, collections (lists/maps/sets) are limited to table definition that use a composed PRIMARY KEY. The reason is that otherwise the underlying comparator is not a composite one, and thus there is no way to support the collection. I very strongly think that we should change this behavior and I've opened CASSANDRA-4329 for that. But besides this limitation, these patches are ready for review and I've pushed a few tests those in dtests (in cql_tests.py).
          Hide
          Sylvain Lebresne added a comment -

          Forgot to mention a few stuffs.

          Currently the result to a query is returned to thrift as JSON (so as a string in JSON format). For sets, it actually returns a list since json has no support of sets.

          Also, the only supported way to query is to query for the full list/map/set. I suppose that we could later had more ways to query, like:

          SELECT L[1] FROM ...;       -- select list element by index
          SELECT M["foo"] FROM ...;   -- select specific map elements
          SELECT S["a":"z"] FROM ...; -- select a slice of a set (since after all our sets and maps are sorted)
          

          But none of this is implemented yet (and I'm keen on pushing that to a follow up ticket).

          Show
          Sylvain Lebresne added a comment - Forgot to mention a few stuffs. Currently the result to a query is returned to thrift as JSON (so as a string in JSON format). For sets, it actually returns a list since json has no support of sets. Also, the only supported way to query is to query for the full list/map/set. I suppose that we could later had more ways to query, like: SELECT L[1] FROM ...; -- select list element by index SELECT M["foo"] FROM ...; -- select specific map elements SELECT S["a":"z"] FROM ...; -- select a slice of a set (since after all our sets and maps are sorted) But none of this is implemented yet (and I'm keen on pushing that to a follow up ticket).
          Hide
          Jonathan Ellis added a comment - - edited

          Sorry, brainstorming again...

          Could we do this instead of put/set? (Do we want to?)

          UPDATE foo SET M[ocd] = 1 WHERE ... ;
          UPDATE foo SET L[0] = 1 WHERE ... ;
          

          UNQL uses dot notation (SET M.odc = 1) but gives no examples of similar sugar for arrays, which leads me to infer that arrays can only be modified as an entire literal at once.

          add_all

          What if we introduced the convention that using the + operator updates the list/set/map? That is,

          SET L = L + [2, 3]
          SET S = S + {4, 5}
          SET M = M + {'asdf': 6, 'fdsa': 7}
          

          If we combined this with the [] syntax for setting a single item, that would just leave us with discard to solve and we wouldn't need this somewhat clunky "method call" syntax.

          Maybe some extension to DELETE?

          Show
          Jonathan Ellis added a comment - - edited Sorry, brainstorming again... Could we do this instead of put/set? (Do we want to?) UPDATE foo SET M[ocd] = 1 WHERE ... ; UPDATE foo SET L[0] = 1 WHERE ... ; UNQL uses dot notation ( SET M.odc = 1 ) but gives no examples of similar sugar for arrays, which leads me to infer that arrays can only be modified as an entire literal at once. add_all What if we introduced the convention that using the + operator updates the list/set/map? That is, SET L = L + [2, 3] SET S = S + {4, 5} SET M = M + {'asdf': 6, 'fdsa': 7} If we combined this with the [] syntax for setting a single item, that would just leave us with discard to solve and we wouldn't need this somewhat clunky "method call" syntax. Maybe some extension to DELETE?
          Hide
          Jonathan Ellis added a comment -

          if you do 2 update with a literal in the same millisecond, you won't end up with one or the other, but rather as a merge of both literals

          That's pretty tough to explain without leaking implementation details, but I guess 1.2 will still be a net win in that respect if we cut out supercolumns.

          the only supported way to query is to query for the full list/map/set

          IMO this is a Good Thing, we don't want to encourage people stuffing their entire model into a single row since that breaks our partitioning assumptions.

          Show
          Jonathan Ellis added a comment - if you do 2 update with a literal in the same millisecond, you won't end up with one or the other, but rather as a merge of both literals That's pretty tough to explain without leaking implementation details, but I guess 1.2 will still be a net win in that respect if we cut out supercolumns. the only supported way to query is to query for the full list/map/set IMO this is a Good Thing, we don't want to encourage people stuffing their entire model into a single row since that breaks our partitioning assumptions.
          Hide
          Jonathan Ellis added a comment -

          Maybe some extension to DELETE?

          DELETE L[0] FROM foo WHERE ...
          DELETE M['asdf'] FROM foo WHERE ...
          DELETE S[4] FROM foo WHERE ...
          

          ?

          Show
          Jonathan Ellis added a comment - Maybe some extension to DELETE? DELETE L[0] FROM foo WHERE ... DELETE M['asdf'] FROM foo WHERE ... DELETE S[4] FROM foo WHERE ... ?
          Hide
          Sylvain Lebresne added a comment -

          If we combined this with the [] syntax for setting a single item, that would just leave us with discard to solve and we wouldn't need this somewhat clunky "method call" syntax.

          That's definitively an option. One drawback I can think of is that I don't see how to support both the discard and discard_idx methods for lists (at least for list<int>), while the clunkier "method call" syntax has no problem with that. I would almost agree that we could offer only discard_idx with no special syntax for discard, if it wasn't for the fact that users can't implement discard as well as we can (because even if they do a read, they won't have access to the underlying TimeUUID, and thus would risk to remove the wrong element if there is concurrent update/delete).

          Maybe some extension to DELETE?

          DELETE L[0] FROM foo WHERE ...
          DELETE M['asdf'] FROM foo WHERE ...
          DELETE S[4] FROM foo WHERE ...
          

          I'm not sure I'm convinced by that syntax for sets, as this reuse a very standard notation for something not standard at all (and I know I suggested a similar for slice for sets above, but upon further reflexion I don't like it). It's also related to the discard/discard_idx problem above since if we find a "better" syntax for sets, we can reuse it for the discard of lists. That being said, I don't have a clearly perfect idea. One suggestion would be to use brackets instead of square brackets, just to make it clear that it selects a value (so DELETE S

          {4}

          FROM ...).

          That's pretty tough to explain without leaking implementation details

          I absolutely agree and I'm certainly not saying it's not a problem. It's just that I don't have a solution to that problem, outside of removing the possibility to "set" a literal at all (which maybe is the right solution).

          Show
          Sylvain Lebresne added a comment - If we combined this with the [] syntax for setting a single item, that would just leave us with discard to solve and we wouldn't need this somewhat clunky "method call" syntax. That's definitively an option. One drawback I can think of is that I don't see how to support both the discard and discard_idx methods for lists (at least for list<int> ), while the clunkier "method call" syntax has no problem with that. I would almost agree that we could offer only discard_idx with no special syntax for discard , if it wasn't for the fact that users can't implement discard as well as we can (because even if they do a read, they won't have access to the underlying TimeUUID, and thus would risk to remove the wrong element if there is concurrent update/delete). Maybe some extension to DELETE? DELETE L[0] FROM foo WHERE ... DELETE M['asdf'] FROM foo WHERE ... DELETE S[4] FROM foo WHERE ... I'm not sure I'm convinced by that syntax for sets, as this reuse a very standard notation for something not standard at all (and I know I suggested a similar for slice for sets above, but upon further reflexion I don't like it). It's also related to the discard/discard_idx problem above since if we find a "better" syntax for sets, we can reuse it for the discard of lists. That being said, I don't have a clearly perfect idea. One suggestion would be to use brackets instead of square brackets, just to make it clear that it selects a value (so DELETE S {4} FROM ...). That's pretty tough to explain without leaking implementation details I absolutely agree and I'm certainly not saying it's not a problem. It's just that I don't have a solution to that problem, outside of removing the possibility to "set" a literal at all (which maybe is the right solution).
          Hide
          Sylvain Lebresne added a comment -

          During the discussions on CASSANDRA-2474, some syntax that were proposed seemed to be a problem for hive. Is there some related problem here? (I have no clue and maybe we don't care, just remembered those previous discussions and figured it's worth asking).

          Show
          Sylvain Lebresne added a comment - During the discussions on CASSANDRA-2474 , some syntax that were proposed seemed to be a problem for hive. Is there some related problem here? (I have no clue and maybe we don't care, just remembered those previous discussions and figured it's worth asking).
          Hide
          Jonathan Ellis added a comment -

          users can't implement discard as well as we can

          I'd argue that if you cared more about the value than the ordering, you should be using a Set instead. And our Sets are inherently ordered (by value), which obviates one of the reasons to use a List when you really mean a Set in many languages.

          I'm not sure I'm convinced by that syntax for sets, as this reuse a very standard notation for something not standard at all

          I'm not sure I follow, what else would S[4] mean? That is: I'm okay with being non-standard, as long as it's not ambiguous.

          some syntax that were proposed seemed to be a problem for hive

          Good point, let's see what Jake says.

          Show
          Jonathan Ellis added a comment - users can't implement discard as well as we can I'd argue that if you cared more about the value than the ordering, you should be using a Set instead. And our Sets are inherently ordered (by value), which obviates one of the reasons to use a List when you really mean a Set in many languages. I'm not sure I'm convinced by that syntax for sets, as this reuse a very standard notation for something not standard at all I'm not sure I follow, what else would S [4] mean? That is: I'm okay with being non-standard, as long as it's not ambiguous. some syntax that were proposed seemed to be a problem for hive Good point, let's see what Jake says.
          Hide
          Lior Golan added a comment -

          Talking about Hive - a question about how you envision Lists/Sets/Maps and Hive integration: Will it be possible to perform a hive query that "joins" against any/all values in a List/Set/Map?

          For example let's say I have the following column families:

          1. Users CF - with row key = user id and a "groups" column for the Set of groups the user belongs to
          2. Groups CF - with row key = group id and a "name" column for group name

          And let's say I want to have a query for the number of users per group (name). In a relational database this would be supported by factoring the relationship between users and groups to a 3rd table ("users_groups"), and performing an inner join between groups and users_groups, grouping by groups.name.

          How will this be supported in Hive (over Cassandra) if the mapping between users and groups is stored as a single "Set" column in the users CF?

          Show
          Lior Golan added a comment - Talking about Hive - a question about how you envision Lists/Sets/Maps and Hive integration: Will it be possible to perform a hive query that "joins" against any/all values in a List/Set/Map? For example let's say I have the following column families: 1. Users CF - with row key = user id and a "groups" column for the Set of groups the user belongs to 2. Groups CF - with row key = group id and a "name" column for group name And let's say I want to have a query for the number of users per group (name). In a relational database this would be supported by factoring the relationship between users and groups to a 3rd table ("users_groups"), and performing an inner join between groups and users_groups, grouping by groups.name. How will this be supported in Hive (over Cassandra) if the mapping between users and groups is stored as a single "Set" column in the users CF?
          Hide
          Jonathan Ellis added a comment -

          https://cwiki.apache.org/confluence/display/Hive/Tutorial#Tutorial-ArrayOperations and https://cwiki.apache.org/confluence/display/Hive/Tutorial#Tutorial-ComplexTypes says that Hive is okay with using the [] operator in SELECT. (Presumably also for UPDATE? this is not explicitly shown.)

          Hive does not appear to have a collection literal syntax, we might need to add quotes to ours to make it happy with that.

          Hive adds a size(collection) method, which makes sense.

          Hive does not support Sets.

          Hive's syntax for CREATEing is the same as ours, except that Hive uses "array" instead of "list."

          PostgreSQL's is different (http://www.postgresql.org/docs/9.1/static/arrays.html) and closer to the SQL standard (http://farrago.sourceforge.net/design/CollectionTypes.html) which however requires specifying an array size up front. There doesn't seem to be any precedent here for Maps, and the Multiset [set of tuples] syntax is terrible.

          Personally I'd lean towards keeping the <> syntax for all types. Ambivalent on renaming list to array for better Hive compatibility; I think "list" is better, but maybe not enough better to justify the incompatibility.

          Show
          Jonathan Ellis added a comment - https://cwiki.apache.org/confluence/display/Hive/Tutorial#Tutorial-ArrayOperations and https://cwiki.apache.org/confluence/display/Hive/Tutorial#Tutorial-ComplexTypes says that Hive is okay with using the [] operator in SELECT. (Presumably also for UPDATE? this is not explicitly shown.) Hive does not appear to have a collection literal syntax, we might need to add quotes to ours to make it happy with that. Hive adds a size(collection) method, which makes sense. Hive does not support Sets. Hive's syntax for CREATEing is the same as ours, except that Hive uses "array" instead of "list." PostgreSQL's is different ( http://www.postgresql.org/docs/9.1/static/arrays.html ) and closer to the SQL standard ( http://farrago.sourceforge.net/design/CollectionTypes.html ) which however requires specifying an array size up front. There doesn't seem to be any precedent here for Maps, and the Multiset [set of tuples] syntax is terrible. Personally I'd lean towards keeping the <> syntax for all types. Ambivalent on renaming list to array for better Hive compatibility; I think "list" is better, but maybe not enough better to justify the incompatibility.
          Hide
          Sylvain Lebresne added a comment -

          I'd argue that if you cared more about the value than the ordering, you should be using a Set instead.

          Ok, but what if you care a little bit of both. Or even if you started with a list because you though that it was what you wanted but end up using it like a set and don't want to migrate data. I mean, I'm playing devil's advocate here and I'm not pretending those are very common case, but I do have a problem with limiting the supported features "because we can't come up with a nice syntax for it". That being, I mentioned that "problem" because it came to mind but we can always keep the "method syntax" for those methods for which we don't have a better alternative.

          what else would S[4] mean? That is: I'm okay with being non-standard, as long as it's not ambiguous.

          I'm not saying it's ambiguous, I'm saying there is maybe better. The square bracket notation for lists and maps is use to select a value by key and you can use it to set the value for that key too. This is not really what this would does for sets, as it would rather select a provided value in the set. So I submit that maybe not reusing the square bracket notation may end up being clearer. But this also boils down to the list discard thing. I disagree we shouldn't have a discard for lists, and as it happens DELETE S['foo'] would be exactly the equivalent of L.discard('foo') (where S is a set and L a list), so it feels wrong to reuse the syntax that for list does discard_idx, not discard.

          Show
          Sylvain Lebresne added a comment - I'd argue that if you cared more about the value than the ordering, you should be using a Set instead. Ok, but what if you care a little bit of both. Or even if you started with a list because you though that it was what you wanted but end up using it like a set and don't want to migrate data. I mean, I'm playing devil's advocate here and I'm not pretending those are very common case, but I do have a problem with limiting the supported features "because we can't come up with a nice syntax for it". That being, I mentioned that "problem" because it came to mind but we can always keep the "method syntax" for those methods for which we don't have a better alternative. what else would S [4] mean? That is: I'm okay with being non-standard, as long as it's not ambiguous. I'm not saying it's ambiguous, I'm saying there is maybe better. The square bracket notation for lists and maps is use to select a value by key and you can use it to set the value for that key too. This is not really what this would does for sets, as it would rather select a provided value in the set. So I submit that maybe not reusing the square bracket notation may end up being clearer. But this also boils down to the list discard thing. I disagree we shouldn't have a discard for lists, and as it happens DELETE S ['foo'] would be exactly the equivalent of L.discard('foo') (where S is a set and L a list), so it feels wrong to reuse the syntax that for list does discard_idx, not discard.
          Hide
          Jonathan Ellis added a comment -

          Will it be possible to perform a hive query that "joins" against any/all values in a List/Set/Map

          Answer should be the same as for existing Hive arrays/maps. I didn't see any examples suggesting support for that, but then again I didn't see anything explicitly saying it's not supported either.

          Show
          Jonathan Ellis added a comment - Will it be possible to perform a hive query that "joins" against any/all values in a List/Set/Map Answer should be the same as for existing Hive arrays/maps. I didn't see any examples suggesting support for that, but then again I didn't see anything explicitly saying it's not supported either.
          Hide
          Jonathan Ellis added a comment -

          it feels wrong to reuse the syntax that for list does discard_idx

          Fair point, but I feel like a Set is more like a Map with no value, than a List with no indexes.

          Show
          Jonathan Ellis added a comment - it feels wrong to reuse the syntax that for list does discard_idx Fair point, but I feel like a Set is more like a Map with no value, than a List with no indexes.
          Hide
          Sylvain Lebresne added a comment -

          Fair point, but I feel like a Set is more like a Map with no value, than a List with no indexes

          Ok, let me rephrase a bit my argumentation.

          1. The first point is that "S[4]" for sets is not a standard notation at all. That doesn't mean we cannot use it (there is no standard notation for that kind of operation that I know of anyway), but that does mean that another notation would be equaly good. And not reusing a very standard notation for something new could actually be a good point: I've learned that we don't all have the same intuition when it comes to syntax, and I would halfway expect some people to think that S[4] actually means "the 4th elements of the (sorted) set S".
          2. Then there is the fact that we don't have a syntax for List.discard and I do think there is no good reason not to support it (not having a syntax is definitely not a good reason as far as I'm concerned; I prefer supporting it with a slightly ugly syntax than none at all). And the thing is, List.discard is very much the same operation than Set.discard, or at least it's much closer in semantic than List.discard_idx is of Set.discard. So inventing a syntax for both List.discard and Set.discard would be coherent.

          So the notation I'm suggesting is to use curly brackets instead of square ones, so S

          {4}. The meaning of such would be to 'select the value 4 in the set S if present'.

          I will note that supporting such notation with that meaning would for example allow us, if we so wish, to support "SELECT S{4}

          ..." as a simple way to test the existence of 4 in S (I'm not saying we need to support that, but at least I think it is a bit early to say we will never support such thing). Lastly, not reusing the square bracket notation will make it clear that "SET S[4] = 3" would be non-sensical (if S is a set).

          Show
          Sylvain Lebresne added a comment - Fair point, but I feel like a Set is more like a Map with no value, than a List with no indexes Ok, let me rephrase a bit my argumentation. The first point is that "S [4] " for sets is not a standard notation at all. That doesn't mean we cannot use it (there is no standard notation for that kind of operation that I know of anyway), but that does mean that another notation would be equaly good. And not reusing a very standard notation for something new could actually be a good point: I've learned that we don't all have the same intuition when it comes to syntax, and I would halfway expect some people to think that S [4] actually means "the 4th elements of the (sorted) set S". Then there is the fact that we don't have a syntax for List.discard and I do think there is no good reason not to support it (not having a syntax is definitely not a good reason as far as I'm concerned; I prefer supporting it with a slightly ugly syntax than none at all). And the thing is, List.discard is very much the same operation than Set.discard, or at least it's much closer in semantic than List.discard_idx is of Set.discard. So inventing a syntax for both List.discard and Set.discard would be coherent. So the notation I'm suggesting is to use curly brackets instead of square ones, so S {4}. The meaning of such would be to 'select the value 4 in the set S if present'. I will note that supporting such notation with that meaning would for example allow us, if we so wish, to support "SELECT S{4} ..." as a simple way to test the existence of 4 in S (I'm not saying we need to support that, but at least I think it is a bit early to say we will never support such thing). Lastly, not reusing the square bracket notation will make it clear that "SET S [4] = 3" would be non-sensical (if S is a set).
          Hide
          Rick Shaw added a comment -

          Why isn't the discard syntax as simple as:

          SET L = L - [2, 3]
          SET S = S - {4, 5}
          SET M = M - {'asdf'}
          

          I assume BTW that braces ("[") are signifying a position in an ordered list, and Brackets ("{") are signifying a value (or label) and not a position.

          Show
          Rick Shaw added a comment - Why isn't the discard syntax as simple as: SET L = L - [2, 3] SET S = S - {4, 5} SET M = M - {'asdf'} I assume BTW that braces ("[") are signifying a position in an ordered list, and Brackets ("{") are signifying a value (or label) and not a position.
          Hide
          Jonathan Ellis added a comment -

          Why isn't the discard syntax as simple as [...]

          I like it!

          I assume BTW that braces ("[") are signifying a position in an ordered list, and Brackets ("{") are signifying a value (or label) and not a position

          That's reasonable, but then you still have the problem of Maps that don't really fit either of those. And like I said, a Set feels more like a Map-without-values to me, than like a List-without-indexes.

          Show
          Jonathan Ellis added a comment - Why isn't the discard syntax as simple as [...] I like it! I assume BTW that braces ("[") are signifying a position in an ordered list, and Brackets ("{") are signifying a value (or label) and not a position That's reasonable, but then you still have the problem of Maps that don't really fit either of those. And like I said, a Set feels more like a Map-without-values to me, than like a List-without-indexes.
          Hide
          Sylvain Lebresne added a comment -

          I assume BTW that braces ("[") are signifying a position in an ordered list, and Brackets ("{") are signifying a value (or label) and not a position

          Are you suggesting that

          SET L = L - [2, 3]
          

          means removing position 2 and 3? Because intuitively I really expect that syntax to remove value 2 and 3. And I'm fine with that syntax for List.discard() btw, but I just want to be clear we agree here.

          But if you mean to use '-' for discard (of list and sets) and keep the "DELETE L[4]" for the discard_idx (for list) and discard_by_key (for map), then I'm good with that.

          Show
          Sylvain Lebresne added a comment - I assume BTW that braces ("[") are signifying a position in an ordered list, and Brackets ("{") are signifying a value (or label) and not a position Are you suggesting that SET L = L - [2, 3] means removing position 2 and 3? Because intuitively I really expect that syntax to remove value 2 and 3. And I'm fine with that syntax for List.discard() btw, but I just want to be clear we agree here. But if you mean to use '-' for discard (of list and sets) and keep the "DELETE L [4] " for the discard_idx (for list) and discard_by_key (for map), then I'm good with that.
          Hide
          Jonathan Ellis added a comment -

          intuitively I really expect that syntax to remove value 2 and 3

          Me, too. [x, y] as "list of x and y" is well-established.

          Show
          Jonathan Ellis added a comment - intuitively I really expect that syntax to remove value 2 and 3 Me, too. [x, y] as "list of x and y" is well-established.
          Hide
          Rick Shaw added a comment -

          But if you mean to use '-' for discard (of list and sets) and keep the "DELETE L[4]" for the discard_idx (for list) and discard_by_key (for map), then I'm good with that.

          I think "DELETE L[4]" makes sense for removing a list item by position (in this context).

          But I think in its Map context,

          SET M = M - {'asdf'}
          SET M = M - {'asdf', 'ghkl'}
          

          seems appropriate to discard a Map entry by label.

          I don't see any reason to remove map entries by position?

          Show
          Rick Shaw added a comment - But if you mean to use '-' for discard (of list and sets) and keep the "DELETE L [4] " for the discard_idx (for list) and discard_by_key (for map), then I'm good with that. I think "DELETE L [4] " makes sense for removing a list item by position (in this context). But I think in its Map context, SET M = M - {'asdf'} SET M = M - {'asdf', 'ghkl'} seems appropriate to discard a Map entry by label. I don't see any reason to remove map entries by position?
          Hide
          Sylvain Lebresne added a comment -

          I've rebased and changed the syntax to match what's above. The new branch is at https://github.com/pcmanus/cassandra/commits/3647-2.

          To sum up the new syntax:

          • lists:
            UPDATE L = L + [ 'a' , 'b' ] WHERE ... // Appends to list
            UPDATE L = [ 'a' , 'b' ] + L WHERE ... // Prepends to list
            UPDATE L[1] = 'c' WHERE ...            // Sets by idx
            UPDATE L = L - [ 'a', 'b' ] WHERE ...  // Remove all occurences of value 'a' and 'b' in list
            DELETE L[1] WHERE ...                  // Deletes by idx
            
          • sets:
            UPDATE S = S + { 'a', 'b' } WHERE ... // Adds to set
            UPDATE S = S - { 'a', 'b' } WHERE ... // Remove values 'a' and 'b' from set
            
          • maps:
            UPDATE M['a'] = 'c' WHERE ...          // Put key,value
            UPDATE M = M + { 'a' : 'c' } WHERE ... // Equivalent to previous
            DELETE M['a'] WHERE ...                // Remove value for key 'a'
            

          A few remarks:

          • We could rename list -> array. I figured one reason to keep list could be to emphasize that there is no predefined size. But I'm good with array.
          • There is no support, for maps, of
            UPDATE M = M - { 'a' : 'c' } WHERE ...
            

            or some other syntax to remove a element of a map by value. The reason is that I don't think we can implement that correctly due to concurrency.

          Show
          Sylvain Lebresne added a comment - I've rebased and changed the syntax to match what's above. The new branch is at https://github.com/pcmanus/cassandra/commits/3647-2 . To sum up the new syntax: lists: UPDATE L = L + [ 'a' , 'b' ] WHERE ... // Appends to list UPDATE L = [ 'a' , 'b' ] + L WHERE ... // Prepends to list UPDATE L[1] = 'c' WHERE ... // Sets by idx UPDATE L = L - [ 'a', 'b' ] WHERE ... // Remove all occurences of value 'a' and 'b' in list DELETE L[1] WHERE ... // Deletes by idx sets: UPDATE S = S + { 'a', 'b' } WHERE ... // Adds to set UPDATE S = S - { 'a', 'b' } WHERE ... // Remove values 'a' and 'b' from set maps: UPDATE M['a'] = 'c' WHERE ... // Put key,value UPDATE M = M + { 'a' : 'c' } WHERE ... // Equivalent to previous DELETE M['a'] WHERE ... // Remove value for key 'a' A few remarks: We could rename list -> array. I figured one reason to keep list could be to emphasize that there is no predefined size. But I'm good with array. There is no support, for maps, of UPDATE M = M - { 'a' : 'c' } WHERE ... or some other syntax to remove a element of a map by value. The reason is that I don't think we can implement that correctly due to concurrency.
          Hide
          Jonathan Ellis added a comment -

          The read-before-write operations on list concern me, since we've avoided these operations thus far (e.g., UPDATE foo SET x=y WHERE w=z). I've been a fan of the status quo since forcing the client to do the read explicitly makes it clear that you're performing a race-prone sequence. Yes, it's less efficient, but in most cases the cost of doing random reads dwarfs the round-trip overhead.

          I'd also note that to my knowledge no other implementation of documents or containers allows efficient updates of individual items. If we force the user to fetch the list, then overwrite the entire list with the desired items removed, we're no worse than the competition.

          Am I off base? Is it time to embrace the race and add this kind of server-side sugar?

          Show
          Jonathan Ellis added a comment - The read-before-write operations on list concern me, since we've avoided these operations thus far (e.g., UPDATE foo SET x=y WHERE w=z ). I've been a fan of the status quo since forcing the client to do the read explicitly makes it clear that you're performing a race-prone sequence. Yes, it's less efficient, but in most cases the cost of doing random reads dwarfs the round-trip overhead. I'd also note that to my knowledge no other implementation of documents or containers allows efficient updates of individual items. If we force the user to fetch the list, then overwrite the entire list with the desired items removed, we're no worse than the competition. Am I off base? Is it time to embrace the race and add this kind of server-side sugar?
          Hide
          Sylvain Lebresne added a comment -

          I've been a fan of the status quo since forcing the client to do the read explicitly makes it clear that you're performing a race-prone sequence.

          I suppose it might depends on the definition of race-prone but I would say that the list operation as implemented are not really race-prone (thanks to the use of TimeUUID for indices instead of plain integers). In particular I'll not that the implementation don't "overwrite the entire list with the desired items removed".

          In particular that means that if 2 clients concurrently set elements i and j in the list, then:

          • either i == j, in which case the usual timestamp resolution rules will apply for element i.
          • or i != j, and we guarantee that both update will be taken into account. So in particular we will not potentially lose one of the insert.

          I'll note that this is the reason why I don't want to implement a discard by value for maps: I don't know how to do that without it being clearly race-prone.

          Now it is true that if a client reads a list and then tries to set the value at index i, there is not guarantee this value will still be the ith value, but I would say that it's mostly us not supporting transactions rather that the list being race-prone.

          Overall I think having set and discard_idx can be useful and have reasonable behavior in face of concurrency. On the other side, if we ask people to read and overwrite the whole list, then it will be much more race-prone .

          Show
          Sylvain Lebresne added a comment - I've been a fan of the status quo since forcing the client to do the read explicitly makes it clear that you're performing a race-prone sequence. I suppose it might depends on the definition of race-prone but I would say that the list operation as implemented are not really race-prone (thanks to the use of TimeUUID for indices instead of plain integers). In particular I'll not that the implementation don't "overwrite the entire list with the desired items removed". In particular that means that if 2 clients concurrently set elements i and j in the list, then: either i == j, in which case the usual timestamp resolution rules will apply for element i. or i != j, and we guarantee that both update will be taken into account. So in particular we will not potentially lose one of the insert. I'll note that this is the reason why I don't want to implement a discard by value for maps: I don't know how to do that without it being clearly race-prone. Now it is true that if a client reads a list and then tries to set the value at index i, there is not guarantee this value will still be the ith value, but I would say that it's mostly us not supporting transactions rather that the list being race-prone. Overall I think having set and discard_idx can be useful and have reasonable behavior in face of concurrency. On the other side, if we ask people to read and overwrite the whole list, then it will be much more race-prone .
          Hide
          Jonathan Ellis added a comment -

          if we ask people to read and overwrite the whole list, then it will be much more race-prone

          Fair enough, I'll buy that.

          Nit: not a huge fan of "int nanos" parameters whose values are really 100s of nanos. Can we have callers multiply by 100 instead, or maybe call it something like "ticks" or "uuidticks" instead?

          Show
          Jonathan Ellis added a comment - if we ask people to read and overwrite the whole list, then it will be much more race-prone Fair enough, I'll buy that. Nit: not a huge fan of "int nanos" parameters whose values are really 100s of nanos. Can we have callers multiply by 100 instead, or maybe call it something like "ticks" or "uuidticks" instead?
          Hide
          T Jake Luciani added a comment -

          How would you avoid a race with Updating a list by idx if there is a prepending insert happening?

          Show
          T Jake Luciani added a comment - How would you avoid a race with Updating a list by idx if there is a prepending insert happening?
          Hide
          Jonathan Ellis added a comment -

          You either get the pre-prepend index, or the post-.

          So no, it doesn't avoid races ("don't do that if it hurts"), but it gives you a sane result.

          Show
          Jonathan Ellis added a comment - You either get the pre-prepend index, or the post-. So no, it doesn't avoid races ("don't do that if it hurts"), but it gives you a sane result.
          Hide
          Pavel Yaskevich added a comment - - edited

          Here is my comments/questions after code review:

          • AbstractType Wouldn't it be a good idea to add a "boolean isCollectionType()" method which by default would return "false"? That would allow to avoid "instanceof" checks which are done all over the place. It's probably time to do the same for CompositeType as we get more and more instanceof checks for it as well.
          • TypeParser method "stringifyCollectionsParameters" - The following is not DRY as the same actions repeated in all clauses, when to append(',') could be distinguished with a condition instead of code duplication.
          • Value I think we should add a way to distinguish between different types of values without using "instanceof" all them. In combination with common iteration method (asList() from the last patch) for all of them that would allow to remove "instanceof" checks and "cast" everywhere.
          • UpdateStatement starting from line 199 - only difference between "instanceof" cases is pre-validation which could be added to the separate method in Value for example and "addMutation" call with different types this also proves previous point about Value.
          • UpdateStatement "mutationForKey" method - do we need to enforce using "group" as a last parameter all the time, even when we set it to "null" ?
          • UpdateStatement line 407 "definition" should be "definitions"
          • CollectionType line 89 ListType line 147 "argument" should be "arguments"
          Show
          Pavel Yaskevich added a comment - - edited Here is my comments/questions after code review: AbstractType Wouldn't it be a good idea to add a "boolean isCollectionType()" method which by default would return "false"? That would allow to avoid "instanceof" checks which are done all over the place. It's probably time to do the same for CompositeType as we get more and more instanceof checks for it as well. TypeParser method "stringifyCollectionsParameters" - The following is not DRY as the same actions repeated in all clauses, when to append(',') could be distinguished with a condition instead of code duplication. Value I think we should add a way to distinguish between different types of values without using "instanceof" all them. In combination with common iteration method (asList() from the last patch) for all of them that would allow to remove "instanceof" checks and "cast" everywhere. UpdateStatement starting from line 199 - only difference between "instanceof" cases is pre-validation which could be added to the separate method in Value for example and "addMutation" call with different types this also proves previous point about Value. UpdateStatement "mutationForKey" method - do we need to enforce using "group" as a last parameter all the time, even when we set it to "null" ? UpdateStatement line 407 "definition" should be "definitions" CollectionType line 89 ListType line 147 "argument" should be "arguments"
          Hide
          Sylvain Lebresne added a comment -

          I've pushed a v3 at https://github.com/pcmanus/cassandra/commits/3647-3 that is rebased and adds one last patch to address some of the remarks. More precisely:

          Wouldn't it be a good idea to add a "boolean isCollectionType()" method which by default would return "false"?

          I don't know actually. That's definitively an option but on the other side I wonder what that would really get us. I know instanceof has bad reputation, and I certainly agree that we shouldn't overuse it, but I don't think we should avoid it at all cost either. In most instanceof usage for CollectionType and CompositeType, a cast follows (and in most case I don't see how to refactor to avoid those casts without being uber ugly, though I'm open to suggestion), and if your going to cast, I think testing with instanceof is actually safer than using a boolean method.

          when to append(',') could be distinguished with a condition instead of code duplication

          Changed.

          • Value I think we should add a way to distinguish between different types of values without using "instanceof" all them
          • UpdateStatement starting from line 199 - only difference between "instanceof" cases is pre-validation which could be added to the separate method in Value

          If I understand correctly those are related. I've refactored Value.java and UpdateStatement a bit to merge the code dealing with the different literals. It does not eliminate all the instanceof and casts, but I think the remaining one are ok (that is, I don't see a clearly better way to do the same thing without the instanceof).

          UpdateStatement "mutationForKey" method - do we need to enforce using "group" as a last parameter all the time, even when we set it to "null" ?

          I don't understand what you are suggesting.

          "definition" should be "definitions"

          Fixed.

          CollectionType line 89 ListType line 147 "argument" should be "arguments"

          I didn't found any instance of "argument" on those line. Maybe they were in the first patches but removed by the later ones?

          Show
          Sylvain Lebresne added a comment - I've pushed a v3 at https://github.com/pcmanus/cassandra/commits/3647-3 that is rebased and adds one last patch to address some of the remarks. More precisely: Wouldn't it be a good idea to add a "boolean isCollectionType()" method which by default would return "false"? I don't know actually. That's definitively an option but on the other side I wonder what that would really get us. I know instanceof has bad reputation, and I certainly agree that we shouldn't overuse it, but I don't think we should avoid it at all cost either. In most instanceof usage for CollectionType and CompositeType, a cast follows (and in most case I don't see how to refactor to avoid those casts without being uber ugly, though I'm open to suggestion), and if your going to cast, I think testing with instanceof is actually safer than using a boolean method. when to append(',') could be distinguished with a condition instead of code duplication Changed. Value I think we should add a way to distinguish between different types of values without using "instanceof" all them UpdateStatement starting from line 199 - only difference between "instanceof" cases is pre-validation which could be added to the separate method in Value If I understand correctly those are related. I've refactored Value.java and UpdateStatement a bit to merge the code dealing with the different literals. It does not eliminate all the instanceof and casts, but I think the remaining one are ok (that is, I don't see a clearly better way to do the same thing without the instanceof). UpdateStatement "mutationForKey" method - do we need to enforce using "group" as a last parameter all the time, even when we set it to "null" ? I don't understand what you are suggesting. "definition" should be "definitions" Fixed. CollectionType line 89 ListType line 147 "argument" should be "arguments" I didn't found any instance of "argument" on those line. Maybe they were in the first patches but removed by the later ones?
          Hide
          Pavel Yaskevich added a comment -

          I don't know actually. That's definitively an option but on the other side I wonder what that would really get us. I know instanceof has bad reputation, and I certainly agree that we shouldn't overuse it, but I don't think we should avoid it at all cost either. In most instanceof usage for CollectionType and CompositeType, a cast follows (and in most case I don't see how to refactor to avoid those casts without being uber ugly, though I'm open to suggestion), and if your going to cast, I think testing with instanceof is actually safer than using a boolean method.

          This is the main problem on my opinion - that we need a casts/instanceof checks which is a chronic problem of type hierarchy of o.a.c.db.marshal package related to Composite and Collection types, I think we should reflect their most commonly used functionality in AbstractType.

          If I understand correctly those are related. I've refactored Value.java and UpdateStatement a bit to merge the code dealing with the different literals. It does not eliminate all the instanceof and casts, but I think the remaining one are ok (that is, I don't see a clearly better way to do the same thing without the instanceof).

          I like what you did there, how about we go further and move validateType(CFDefinition.Name) and constructionFunction() to Value and remove (or rename Value to) Literal as all of the classes implement only that one interface, so it would be someting like Literal.

          {List, Set, Map}

          and you would be able to "assert instance of Literal" in UpdateStatement?...

          "definition" should be "definitions"

          There are couple of same typos in the UpdateStatement left - lines 341, 349 and 373

          UpdateStatement "mutationForKey" method - do we need to enforce using "group" as a last parameter all the time, even when we set it to "null" ?

          That was fixed in the last commits so no problem.

          CollectionType line 89 ListType line 147 "argument" should be "arguments"

          That seems to be removed too, sorry, I have seen they in a first commits.

          Show
          Pavel Yaskevich added a comment - I don't know actually. That's definitively an option but on the other side I wonder what that would really get us. I know instanceof has bad reputation, and I certainly agree that we shouldn't overuse it, but I don't think we should avoid it at all cost either. In most instanceof usage for CollectionType and CompositeType, a cast follows (and in most case I don't see how to refactor to avoid those casts without being uber ugly, though I'm open to suggestion), and if your going to cast, I think testing with instanceof is actually safer than using a boolean method. This is the main problem on my opinion - that we need a casts/instanceof checks which is a chronic problem of type hierarchy of o.a.c.db.marshal package related to Composite and Collection types, I think we should reflect their most commonly used functionality in AbstractType. If I understand correctly those are related. I've refactored Value.java and UpdateStatement a bit to merge the code dealing with the different literals. It does not eliminate all the instanceof and casts, but I think the remaining one are ok (that is, I don't see a clearly better way to do the same thing without the instanceof). I like what you did there, how about we go further and move validateType(CFDefinition.Name) and constructionFunction() to Value and remove (or rename Value to) Literal as all of the classes implement only that one interface, so it would be someting like Literal. {List, Set, Map} and you would be able to "assert instance of Literal" in UpdateStatement?... "definition" should be "definitions" There are couple of same typos in the UpdateStatement left - lines 341 , 349 and 373 UpdateStatement "mutationForKey" method - do we need to enforce using "group" as a last parameter all the time, even when we set it to "null" ? That was fixed in the last commits so no problem. CollectionType line 89 ListType line 147 "argument" should be "arguments" That seems to be removed too, sorry, I have seen they in a first commits.
          Hide
          Sylvain Lebresne added a comment -

          that we need a casts/instanceof checks which is a chronic problem of type hierarchy of o.a.c.db.marshal package related to Composite and Collection types

          We test 6 times for "instanceof CollectionType": one time is to call serializeForThrift() and 2 other times are to call the execute() methods. Both of those are very specific to CollectionType currently and I'm not convainced at all moving these "functionality" to AbstractType would make sense/be cleaner. One other time, we cast to CollectionType in order to construct a ColumnToCollectionType so we'll have to do that cast anyway. Remains 2 occurences were we could indeed replace the 'instanceof' by a 'isCollection()' method (we don't cast to CollectionType after the instanceof test), but again I'm not sure there is a point? It won't be measurably faster and it won't even save characters (since we'll have to define the isCollection method).

          As for CompositeType, there is (only) 4 occurrences of "instanceof CompositeType", all of which are followed by a cast so that we can access the 'types' field of CompositeType. I don't see any sane way to move that functionality into AbstractType.

          So honestly I think we actually do a reasonably good job of 'reflecting their most commonly used functionality in AbstractType' and in practice I don't really see how to do better.

          remove (or rename Value to) Literal as all of the classes implement only that one interface, so it would be someting like Literal.{List, Set, Map} and you would be able to "assert instance of Literal" in UpdateStatement?

          We cannot do that because Term implements Value. The point of Value is to be a Term or a Literal.

          There are couple of same typos in the UpdateStatement left - lines 341, 349 and 373

          I'll update those.

          Show
          Sylvain Lebresne added a comment - that we need a casts/instanceof checks which is a chronic problem of type hierarchy of o.a.c.db.marshal package related to Composite and Collection types We test 6 times for "instanceof CollectionType": one time is to call serializeForThrift() and 2 other times are to call the execute() methods. Both of those are very specific to CollectionType currently and I'm not convainced at all moving these "functionality" to AbstractType would make sense/be cleaner. One other time, we cast to CollectionType in order to construct a ColumnToCollectionType so we'll have to do that cast anyway. Remains 2 occurences were we could indeed replace the 'instanceof' by a 'isCollection()' method (we don't cast to CollectionType after the instanceof test), but again I'm not sure there is a point? It won't be measurably faster and it won't even save characters (since we'll have to define the isCollection method). As for CompositeType, there is (only) 4 occurrences of "instanceof CompositeType", all of which are followed by a cast so that we can access the 'types' field of CompositeType. I don't see any sane way to move that functionality into AbstractType. So honestly I think we actually do a reasonably good job of 'reflecting their most commonly used functionality in AbstractType' and in practice I don't really see how to do better. remove (or rename Value to) Literal as all of the classes implement only that one interface, so it would be someting like Literal.{List, Set, Map} and you would be able to "assert instance of Literal" in UpdateStatement? We cannot do that because Term implements Value. The point of Value is to be a Term or a Literal. There are couple of same typos in the UpdateStatement left - lines 341 , 349 and 373 I'll update those.
          Hide
          Pavel Yaskevich added a comment - - edited

          We test 6 times for "instanceof CollectionType": one time is to call serializeForThrift() and 2 other times are to call the execute() methods...

          I wouldn't actually judge this by current usages of instanceof/casts because what they show bad tendency where users are forced to use downcasts to get things which are natural to the complex types but not reflected in the common AbstractType interface. Every time instanceof is used it indicates that components' OO design is broken.

          If the most common thing we do after cast with CompositeType is get "types" field then we could add Collection<AbstractType<?>> getComponentTypes() method. Simple types would return just one component - themselves (or throw an exception saying that type is not complex to be sure that method is not misused) and complex ones would be able to return immutable list of their component types, in combination with isComposite() method that would eliminate instanceof/downcasts completely.

          I don't currently clearly see how to be with CollectionType.execute

          {Function}

          methods but I will think about it.

          We cannot do that because Term implements Value. The point of Value is to be a Term or a Literal.

          That means that you are confusing semantics of the "literal" e.g. https://en.wikipedia.org/wiki/Literal_(computer_programming) , so Term actually should become

          {Unary | Single}

          Literal and implement Literal where

          {List, Set, Map}

          classes would have UnaryLiteral as base element, there is no need to invent hierarchy like Value.Literal just to make it fit with current Term implementation.

          Show
          Pavel Yaskevich added a comment - - edited We test 6 times for "instanceof CollectionType": one time is to call serializeForThrift() and 2 other times are to call the execute() methods... I wouldn't actually judge this by current usages of instanceof/casts because what they show bad tendency where users are forced to use downcasts to get things which are natural to the complex types but not reflected in the common AbstractType interface. Every time instanceof is used it indicates that components' OO design is broken. If the most common thing we do after cast with CompositeType is get "types" field then we could add Collection<AbstractType<?>> getComponentTypes() method. Simple types would return just one component - themselves (or throw an exception saying that type is not complex to be sure that method is not misused) and complex ones would be able to return immutable list of their component types, in combination with isComposite() method that would eliminate instanceof/downcasts completely. I don't currently clearly see how to be with CollectionType.execute {Function} methods but I will think about it. We cannot do that because Term implements Value. The point of Value is to be a Term or a Literal. That means that you are confusing semantics of the "literal" e.g. https://en.wikipedia.org/wiki/Literal_(computer_programming ) , so Term actually should become {Unary | Single} Literal and implement Literal where {List, Set, Map} classes would have UnaryLiteral as base element, there is no need to invent hierarchy like Value.Literal just to make it fit with current Term implementation.
          Hide
          Sylvain Lebresne added a comment -

          Every time instanceof is used it indicates that components' OO design is broken.

          Ok fine. I don't agree (more precisely I don't think perfect OO design at all cost is necessarily superior) but I'm not interested in that debate, at least not on that ticket. So do you have concrete proposition for that ticket? If not, I suggest we consider it good enough for now, and feel free to open a follow up ticket with a clean refactor that follow OO design to the letter.

          That means that you are confusing semantics of the "literal"

          Please Pavel, let's stop with that. I don't fucking care about wikipedia definitions and thanks but I'm not confused by any semantic.

          So yes, when I say Literal it's a name for the class used to represent the collection literals. If you find the naming confusing, then fine, we can call the class CollectionLiteral (that's even a good idea).

          Now the fact is that we need a class to represent those collection literals and we need a common ancestor to that class and the Term class. Making the literals being of class Term (or a subclass) would be ugly because none of the method of Term apply to the (collection) literals. But the literals also have methods that make no sense for Term (namely validateType, isEmpty and constructFunction), so we don't want to put those methods in the interface shared with Term either.

          So I don't see a cleaner to having Term, Value and Literal (though again, I'm fine changing the naming) and I'm not confused about that.

          And if we're being pedantic on naming, Term should not be renamed to UnaryLiteral because it does not only represent literals, but also bind variables.

          Show
          Sylvain Lebresne added a comment - Every time instanceof is used it indicates that components' OO design is broken. Ok fine. I don't agree (more precisely I don't think perfect OO design at all cost is necessarily superior) but I'm not interested in that debate, at least not on that ticket. So do you have concrete proposition for that ticket? If not, I suggest we consider it good enough for now, and feel free to open a follow up ticket with a clean refactor that follow OO design to the letter. That means that you are confusing semantics of the "literal" Please Pavel, let's stop with that. I don't fucking care about wikipedia definitions and thanks but I'm not confused by any semantic. So yes, when I say Literal it's a name for the class used to represent the collection literals. If you find the naming confusing, then fine, we can call the class CollectionLiteral (that's even a good idea). Now the fact is that we need a class to represent those collection literals and we need a common ancestor to that class and the Term class. Making the literals being of class Term (or a subclass) would be ugly because none of the method of Term apply to the (collection) literals. But the literals also have methods that make no sense for Term (namely validateType, isEmpty and constructFunction), so we don't want to put those methods in the interface shared with Term either. So I don't see a cleaner to having Term, Value and Literal (though again, I'm fine changing the naming) and I'm not confused about that. And if we're being pedantic on naming, Term should not be renamed to UnaryLiteral because it does not only represent literals, but also bind variables.
          Hide
          Pavel Yaskevich added a comment -

          Ok fine. I don't agree (more precisely I don't think perfect OO design at all cost is necessarily superior) but I'm not interested in that debate, at least not on that ticket. So do you have concrete proposition for that ticket? If not, I suggest we consider it good enough for now, and feel free to open a follow up ticket with a clean refactor that follow OO design to the letter.

          This is what I wrote in the previous ticket - If the most common thing we do after cast with CompositeType is get "types" field then we could add Collection<AbstractType<?>> getComponentTypes() method. Simple types would return just one component - themselves (or throw an exception saying that type is not complex to be sure that method is not misused) and complex ones would be able to return immutable list of their component types, in combination with isComposite() method that would eliminate instanceof/downcasts completely.

          Now the fact is that we need a class to represent those collection literals and we need a common ancestor to that class and the Term class. Making the literals being of class Term (or a subclass) would be ugly because none of the method of Term apply to the (collection) literals. But the literals also have methods that make no sense for Term (namely validateType, isEmpty and constructFunction), so we don't want to put those methods in the interface shared with Term either.

          And why do we need the common ancestor so much if List/Set/Map don't have anything in common with Term except those classes are it's containers?

          Show
          Pavel Yaskevich added a comment - Ok fine. I don't agree (more precisely I don't think perfect OO design at all cost is necessarily superior) but I'm not interested in that debate, at least not on that ticket. So do you have concrete proposition for that ticket? If not, I suggest we consider it good enough for now, and feel free to open a follow up ticket with a clean refactor that follow OO design to the letter. This is what I wrote in the previous ticket - If the most common thing we do after cast with CompositeType is get "types" field then we could add Collection<AbstractType<?>> getComponentTypes() method. Simple types would return just one component - themselves (or throw an exception saying that type is not complex to be sure that method is not misused) and complex ones would be able to return immutable list of their component types, in combination with isComposite() method that would eliminate instanceof/downcasts completely. Now the fact is that we need a class to represent those collection literals and we need a common ancestor to that class and the Term class. Making the literals being of class Term (or a subclass) would be ugly because none of the method of Term apply to the (collection) literals. But the literals also have methods that make no sense for Term (namely validateType, isEmpty and constructFunction), so we don't want to put those methods in the interface shared with Term either. And why do we need the common ancestor so much if List/Set/Map don't have anything in common with Term except those classes are it's containers?
          Hide
          Sylvain Lebresne added a comment -

          If the most common thing we do after cast with CompositeType is get "types" field then we could add Collection<AbstractType<?>> getComponentTypes() method

          So 100% of the time, when we access the "types" field the code is specific to compositeType, so we would still have to test if the type is composite and we would never call the getComponentTypes for non-composite type. Hence the only thing that would change is that we would replace "if (type instanceof CompositeType)" by "if (type.isCompositeType())" and "((CompositeType)type).types" by "type.getComponentTypes()" which is imho exactly the same thing (allergic reaction to instanceof/casts put aside). Except that in the meantime you'll have polluted the interface of AbstractType with 2 new methods. And it is less safe because if someone writes a custom type that happens to change the implementation of those methods, it will almost surely be a bug. So I'm not convinced at all by that idea.

          But don't get me wrong, if I were to redesign C* from the ground up, I would probably make composite a more "native" notion and everything would be a composite (with potentially only 1 component) and composites wouldn't be a subclass of AbstractType at all. But this is not really an option, at least not on the short term.

          On a more general "design" level, in the current code, an AbstractType in general don't have "components types" so I don't think we should add a getComponentTypes() method. But since all that has nothing to do with this ticket, if you think it would be a good idea, I can only encourage you to open a separate ticket on which we could have a more focused discussion.

          And why do we need the common ancestor

          Because we want Operation.Set and 'List<Value> columnValues' in UpdateStatement to be able to contain both Term and collection Literal (i.e. Value is a marker interface basically).

          List/Set/Map don't have anything in common with Term

          That's not what I said. And in fact Value defines the asList() method that both Term and List/Set/Map implement. What I said is that List/Set/Map also have most of their methods that don't apply to Term and Term have most of his method that don't apply to List/Set/Map. Having the intersection non-empty don't mean one is contain in the other.

          Note that I'm not saying it's the only design. We probably could duplicate Operation.Set to have one applying to Term and one applying to collection literals and split UpdateStatement.columnValues into two lists, but imho having the Value interface to avoid the code duplication is cleaner.

          Show
          Sylvain Lebresne added a comment - If the most common thing we do after cast with CompositeType is get "types" field then we could add Collection<AbstractType<?>> getComponentTypes() method So 100% of the time, when we access the "types" field the code is specific to compositeType, so we would still have to test if the type is composite and we would never call the getComponentTypes for non-composite type. Hence the only thing that would change is that we would replace "if (type instanceof CompositeType)" by "if (type.isCompositeType())" and "((CompositeType)type).types" by "type.getComponentTypes()" which is imho exactly the same thing (allergic reaction to instanceof/casts put aside). Except that in the meantime you'll have polluted the interface of AbstractType with 2 new methods. And it is less safe because if someone writes a custom type that happens to change the implementation of those methods, it will almost surely be a bug. So I'm not convinced at all by that idea. But don't get me wrong, if I were to redesign C* from the ground up, I would probably make composite a more "native" notion and everything would be a composite (with potentially only 1 component) and composites wouldn't be a subclass of AbstractType at all. But this is not really an option, at least not on the short term. On a more general "design" level, in the current code, an AbstractType in general don't have "components types" so I don't think we should add a getComponentTypes() method. But since all that has nothing to do with this ticket, if you think it would be a good idea, I can only encourage you to open a separate ticket on which we could have a more focused discussion. And why do we need the common ancestor Because we want Operation.Set and 'List<Value> columnValues' in UpdateStatement to be able to contain both Term and collection Literal (i.e. Value is a marker interface basically). List/Set/Map don't have anything in common with Term That's not what I said. And in fact Value defines the asList() method that both Term and List/Set/Map implement. What I said is that List/Set/Map also have most of their methods that don't apply to Term and Term have most of his method that don't apply to List/Set/Map. Having the intersection non-empty don't mean one is contain in the other. Note that I'm not saying it's the only design. We probably could duplicate Operation.Set to have one applying to Term and one applying to collection literals and split UpdateStatement.columnValues into two lists, but imho having the Value interface to avoid the code duplication is cleaner.
          Hide
          Pavel Yaskevich added a comment -

          I think I put my position on both points as clear as I could. Somebody else would have to take a look at this as well, I am not convinced that (at least) approuch taken for collection representation such as term/value/literal interconnection is a correct way to go. Code also adds few new types to o.a.c.db.marshal that make AbstractType even worse interface with new "execute" operations added.

          Show
          Pavel Yaskevich added a comment - I think I put my position on both points as clear as I could. Somebody else would have to take a look at this as well, I am not convinced that (at least) approuch taken for collection representation such as term/value/literal interconnection is a correct way to go. Code also adds few new types to o.a.c.db.marshal that make AbstractType even worse interface with new "execute" operations added.
          Hide
          Jonathan Ellis added a comment -

          Pushed some tweaks to https://github.com/jbellis/cassandra/commits/3647-4. In particular, renamed the position compare/validate overloads to be more clear what they are for. With that clarified, I think it's worth letting polymorphism do the branch here, so having them in AbstractType is a good thing.

          I agree that superficially isComposite() would be a good addition to AbstractType... but the way we use instanceof here, I think DCT would have to return false, which would be pretty confusing. So instanceof looks like the lesser evil.

          I think the current approach for collection representation at the Antlr level is reasonable. It may be that Pavel's idea is better but I don't think we're going to make more progress on that w/o actually coding it up. So if you want to give that a try in a separate ticket Pavel, that's fine, but I don't think we should block this.

          Overall +1 from me.

          Show
          Jonathan Ellis added a comment - Pushed some tweaks to https://github.com/jbellis/cassandra/commits/3647-4 . In particular, renamed the position compare/validate overloads to be more clear what they are for. With that clarified, I think it's worth letting polymorphism do the branch here, so having them in AbstractType is a good thing. I agree that superficially isComposite() would be a good addition to AbstractType... but the way we use instanceof here, I think DCT would have to return false, which would be pretty confusing. So instanceof looks like the lesser evil. I think the current approach for collection representation at the Antlr level is reasonable. It may be that Pavel's idea is better but I don't think we're going to make more progress on that w/o actually coding it up. So if you want to give that a try in a separate ticket Pavel, that's fine, but I don't think we should block this. Overall +1 from me.
          Hide
          Pavel Yaskevich added a comment -

          I think the current approach for collection representation at the Antlr level is reasonable. It may be that Pavel's idea is better but I don't think we're going to make more progress on that w/o actually coding it up. So if you want to give that a try in a separate ticket Pavel, that's fine, but I don't think we should block this.

          I believe we should block this and even more - I think we should make fine-grained sub-issues to cover changes, like:

          • "Prepare marshal package for collection type support", which would add collection types and make AbstractType happy with them;
          • "CQL3 grammar support for 'collection' operations on list/set/map", which would add a ANTLR code to support required operations on the grammar level without using shaky combinations like term/value/literal;
          • and this one would a glue for the two described above.

          because as in the current state marshal is broken in a way that it requires instanceof checks, this one pushes it to the whole new level by introducing functionality that "type" should not have by it's nature e.g. "execute". I'm not fond of signing this up until we do something about it or we are building a house without proper fundament.

          Show
          Pavel Yaskevich added a comment - I think the current approach for collection representation at the Antlr level is reasonable. It may be that Pavel's idea is better but I don't think we're going to make more progress on that w/o actually coding it up. So if you want to give that a try in a separate ticket Pavel, that's fine, but I don't think we should block this. I believe we should block this and even more - I think we should make fine-grained sub-issues to cover changes, like: "Prepare marshal package for collection type support", which would add collection types and make AbstractType happy with them; "CQL3 grammar support for 'collection' operations on list/set/map", which would add a ANTLR code to support required operations on the grammar level without using shaky combinations like term/value/literal; and this one would a glue for the two described above. because as in the current state marshal is broken in a way that it requires instanceof checks, this one pushes it to the whole new level by introducing functionality that "type" should not have by it's nature e.g. "execute". I'm not fond of signing this up until we do something about it or we are building a house without proper fundament.
          Hide
          Sylvain Lebresne added a comment -

          I believe we should block this

          This is completely ridiculous.

          I don't pretend that this patch is perfect (I never pretend such a thing) but suggesting that the style of this patch is so bad that we should block this and redo it is wrong and frankly insulting.

          And thruth being told, considering that almost all tickets are some form of compromise (as they should be) between getting things done and refactoring too much code to achieve some notion of code beauty, I'm actually reasonably happy with this patch.

          Besides, this is a feature that people are excited about for CQL3 (and some ticket like CASSANDRA-4351 could make a good use of it), so we definitively don't want to delay that unless we have a very good reason.

          Anyway, I've committed this (based on Jonathan's review). But I'm impatiently waiting for Pavel's patch that will refactor the hell out of this to give us solid foundation.

          Show
          Sylvain Lebresne added a comment - I believe we should block this This is completely ridiculous. I don't pretend that this patch is perfect (I never pretend such a thing) but suggesting that the style of this patch is so bad that we should block this and redo it is wrong and frankly insulting. And thruth being told, considering that almost all tickets are some form of compromise (as they should be) between getting things done and refactoring too much code to achieve some notion of code beauty, I'm actually reasonably happy with this patch. Besides, this is a feature that people are excited about for CQL3 (and some ticket like CASSANDRA-4351 could make a good use of it), so we definitively don't want to delay that unless we have a very good reason. Anyway, I've committed this (based on Jonathan's review). But I'm impatiently waiting for Pavel's patch that will refactor the hell out of this to give us solid foundation.
          Hide
          Pavel Yaskevich added a comment -

          Reopening this because it because committed code doesn't work and attaching my alternative version with changes I was taking about in my previous comments.

          When I try to do SET operation on any collection type I get following exception:

          java.lang.IllegalStateException: Composite column is already fully constructed
                    at org.apache.cassandra.db.marshal.CompositeType$Builder.buildAsEndOfRange(CompositeType.java:290)
                    at org.apache.cassandra.cql3.statements.UpdateStatement.addToMutation(UpdateStatement.java:250)
                    at org.apache.cassandra.cql3.statements.UpdateStatement.mutationForKey(UpdateStatement.java:218)
                    at org.apache.cassandra.cql3.statements.UpdateStatement.getMutations(UpdateStatement.java:171)
                    at org.apache.cassandra.cql3.statements.ModificationStatement.execute(ModificationStatement.java:84)
                    at org.apache.cassandra.cql3.QueryProcessor.processStatement(QueryProcessor.java:108)
                    at org.apache.cassandra.cql3.QueryProcessor.process(QueryProcessor.java:116)
                    at org.apache.cassandra.thrift.CassandraServer.execute_cql_query(CassandraServer.java:1240)
                    at org.apache.cassandra.thrift.Cassandra$Processor$execute_cql_query.getResult(Cassandra.java:3542)
                    at org.apache.cassandra.thrift.Cassandra$Processor$execute_cql_query.getResult(Cassandra.java:3530)
                    at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:32)
                    at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:34)
                    at org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:184)
                    at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
                    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
                    at java.lang.Thread.run(Thread.java:680)
          

          Following exceptions observed on List.Append:

          java.lang.IllegalStateException: Composite column is already fully constructed
                    at org.apache.cassandra.db.marshal.CompositeType$Builder.add(CompositeType.java:264)
                    at org.apache.cassandra.db.marshal.CompositeType$Builder.add(CompositeType.java:196)
                    at org.apache.cassandra.db.marshal.ListType.doAppend(ListType.java:197)
                    at org.apache.cassandra.db.marshal.ListType.executeFunction(ListType.java:128)
                    at org.apache.cassandra.db.marshal.CollectionType.execute(CollectionType.java:84)
                    at org.apache.cassandra.cql3.statements.UpdateStatement.addToMutation(UpdateStatement.java:286)
                    at org.apache.cassandra.cql3.statements.UpdateStatement.mutationForKey(UpdateStatement.java:218)
                    at org.apache.cassandra.cql3.statements.UpdateStatement.getMutations(UpdateStatement.java:171)
                    at org.apache.cassandra.cql3.statements.ModificationStatement.execute(ModificationStatement.java:84)
                    at org.apache.cassandra.cql3.QueryProcessor.processStatement(QueryProcessor.java:108)
                    at org.apache.cassandra.cql3.QueryProcessor.process(QueryProcessor.java:116)
                    at org.apache.cassandra.thrift.CassandraServer.execute_cql_query(CassandraServer.java:1240)
                    at org.apache.cassandra.thrift.Cassandra$Processor$execute_cql_query.getResult(Cassandra.java:3542)
                    at org.apache.cassandra.thrift.Cassandra$Processor$execute_cql_query.getResult(Cassandra.java:3530)
                    at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:32)
                    at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:34)
                    at org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:184)
                    at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
                    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
                    at java.lang.Thread.run(Thread.java:680)
          

          And Set.Add

          ava.lang.IllegalStateException: Composite column is already fully constructed
                    at org.apache.cassandra.db.marshal.CompositeType$Builder.add(CompositeType.java:264)
                    at org.apache.cassandra.db.marshal.CompositeType$Builder.add(CompositeType.java:196)
                    at org.apache.cassandra.db.marshal.SetType.doAdd(SetType.java:111)
                    at org.apache.cassandra.db.marshal.SetType.executeFunction(SetType.java:96)
                    at org.apache.cassandra.db.marshal.CollectionType.execute(CollectionType.java:84)
                    at org.apache.cassandra.cql3.statements.UpdateStatement.addToMutation(UpdateStatement.java:286)
                    at org.apache.cassandra.cql3.statements.UpdateStatement.mutationForKey(UpdateStatement.java:218)
                    at org.apache.cassandra.cql3.statements.UpdateStatement.getMutations(UpdateStatement.java:171)
                    at org.apache.cassandra.cql3.statements.ModificationStatement.execute(ModificationStatement.java:84)
                    at org.apache.cassandra.cql3.QueryProcessor.processStatement(QueryProcessor.java:108)
                    at org.apache.cassandra.cql3.QueryProcessor.process(QueryProcessor.java:116)
                    at org.apache.cassandra.thrift.CassandraServer.execute_cql_query(CassandraServer.java:1240)
                    at org.apache.cassandra.thrift.Cassandra$Processor$execute_cql_query.getResult(Cassandra.java:3542)
                    at org.apache.cassandra.thrift.Cassandra$Processor$execute_cql_query.getResult(Cassandra.java:3530)
                    at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:32)
                    at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:34)
                    at org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:184)
                    at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
                    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
                    at java.lang.Thread.run(Thread.java:680)
          

          Map.Put has the exact same problem but I won't add stacktrace here for brevity.

          My patch, if accepted, could be taken as a ground to fix previously mentioned issues.

          Also I think I can do something about CollectionType.serializeForThrift if needed.

          Show
          Pavel Yaskevich added a comment - Reopening this because it because committed code doesn't work and attaching my alternative version with changes I was taking about in my previous comments. When I try to do SET operation on any collection type I get following exception: java.lang.IllegalStateException: Composite column is already fully constructed at org.apache.cassandra.db.marshal.CompositeType$Builder.buildAsEndOfRange(CompositeType.java:290) at org.apache.cassandra.cql3.statements.UpdateStatement.addToMutation(UpdateStatement.java:250) at org.apache.cassandra.cql3.statements.UpdateStatement.mutationForKey(UpdateStatement.java:218) at org.apache.cassandra.cql3.statements.UpdateStatement.getMutations(UpdateStatement.java:171) at org.apache.cassandra.cql3.statements.ModificationStatement.execute(ModificationStatement.java:84) at org.apache.cassandra.cql3.QueryProcessor.processStatement(QueryProcessor.java:108) at org.apache.cassandra.cql3.QueryProcessor.process(QueryProcessor.java:116) at org.apache.cassandra.thrift.CassandraServer.execute_cql_query(CassandraServer.java:1240) at org.apache.cassandra.thrift.Cassandra$Processor$execute_cql_query.getResult(Cassandra.java:3542) at org.apache.cassandra.thrift.Cassandra$Processor$execute_cql_query.getResult(Cassandra.java:3530) at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:32) at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:34) at org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:184) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:680) Following exceptions observed on List.Append: java.lang.IllegalStateException: Composite column is already fully constructed at org.apache.cassandra.db.marshal.CompositeType$Builder.add(CompositeType.java:264) at org.apache.cassandra.db.marshal.CompositeType$Builder.add(CompositeType.java:196) at org.apache.cassandra.db.marshal.ListType.doAppend(ListType.java:197) at org.apache.cassandra.db.marshal.ListType.executeFunction(ListType.java:128) at org.apache.cassandra.db.marshal.CollectionType.execute(CollectionType.java:84) at org.apache.cassandra.cql3.statements.UpdateStatement.addToMutation(UpdateStatement.java:286) at org.apache.cassandra.cql3.statements.UpdateStatement.mutationForKey(UpdateStatement.java:218) at org.apache.cassandra.cql3.statements.UpdateStatement.getMutations(UpdateStatement.java:171) at org.apache.cassandra.cql3.statements.ModificationStatement.execute(ModificationStatement.java:84) at org.apache.cassandra.cql3.QueryProcessor.processStatement(QueryProcessor.java:108) at org.apache.cassandra.cql3.QueryProcessor.process(QueryProcessor.java:116) at org.apache.cassandra.thrift.CassandraServer.execute_cql_query(CassandraServer.java:1240) at org.apache.cassandra.thrift.Cassandra$Processor$execute_cql_query.getResult(Cassandra.java:3542) at org.apache.cassandra.thrift.Cassandra$Processor$execute_cql_query.getResult(Cassandra.java:3530) at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:32) at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:34) at org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:184) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:680) And Set.Add ava.lang.IllegalStateException: Composite column is already fully constructed at org.apache.cassandra.db.marshal.CompositeType$Builder.add(CompositeType.java:264) at org.apache.cassandra.db.marshal.CompositeType$Builder.add(CompositeType.java:196) at org.apache.cassandra.db.marshal.SetType.doAdd(SetType.java:111) at org.apache.cassandra.db.marshal.SetType.executeFunction(SetType.java:96) at org.apache.cassandra.db.marshal.CollectionType.execute(CollectionType.java:84) at org.apache.cassandra.cql3.statements.UpdateStatement.addToMutation(UpdateStatement.java:286) at org.apache.cassandra.cql3.statements.UpdateStatement.mutationForKey(UpdateStatement.java:218) at org.apache.cassandra.cql3.statements.UpdateStatement.getMutations(UpdateStatement.java:171) at org.apache.cassandra.cql3.statements.ModificationStatement.execute(ModificationStatement.java:84) at org.apache.cassandra.cql3.QueryProcessor.processStatement(QueryProcessor.java:108) at org.apache.cassandra.cql3.QueryProcessor.process(QueryProcessor.java:116) at org.apache.cassandra.thrift.CassandraServer.execute_cql_query(CassandraServer.java:1240) at org.apache.cassandra.thrift.Cassandra$Processor$execute_cql_query.getResult(Cassandra.java:3542) at org.apache.cassandra.thrift.Cassandra$Processor$execute_cql_query.getResult(Cassandra.java:3530) at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:32) at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:34) at org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:184) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:680) Map.Put has the exact same problem but I won't add stacktrace here for brevity. My patch, if accepted, could be taken as a ground to fix previously mentioned issues. Also I think I can do something about CollectionType.serializeForThrift if needed.
          Hide
          Sylvain Lebresne added a comment -

          When I try to do SET operation on any collection type I get following exception

          What would be useful is the query used to trigger that exception. I'll note that there is some basic tests for this at https://github.com/riptano/cassandra-dtest/blob/master/cql_tests.py (set_test, list_test and map_test) which have some SET operation and are working correctly. And btw, everyone is very much welcome to help adding/improving those tests.

          attaching my alternative version

          That refactor lgtm on principle, however:

          • For lists, ListOperation.SET is used for both 'set a column to a literal' and 'set a list element by index', so setting a literal is broken. Nit: for map and sets, the behavior of doSet() relay on the fallthrough of the switch statement, it would be nice to add a comment explaining that the falltrhough is on purpose.
          • In UpdateStatement.prepare(), you replaced the check !(value instanceof Term) by operation.getType() == Operation.Type.COLUMN, which is reversed.
          Show
          Sylvain Lebresne added a comment - When I try to do SET operation on any collection type I get following exception What would be useful is the query used to trigger that exception. I'll note that there is some basic tests for this at https://github.com/riptano/cassandra-dtest/blob/master/cql_tests.py (set_test, list_test and map_test) which have some SET operation and are working correctly. And btw, everyone is very much welcome to help adding/improving those tests. attaching my alternative version That refactor lgtm on principle, however: For lists, ListOperation.SET is used for both 'set a column to a literal' and 'set a list element by index', so setting a literal is broken. Nit: for map and sets, the behavior of doSet() relay on the fallthrough of the switch statement, it would be nice to add a comment explaining that the falltrhough is on purpose. In UpdateStatement.prepare(), you replaced the check !(value instanceof Term) by operation.getType() == Operation.Type.COLUMN, which is reversed.
          Hide
          Pavel Yaskevich added a comment -

          Attaching updated alternative patch with all problems fixed, cql_tests are now passing.

          Show
          Pavel Yaskevich added a comment - Attaching updated alternative patch with all problems fixed, cql_tests are now passing.
          Hide
          Pavel Yaskevich added a comment -

          What would be useful is the query used to trigger that exception. I'll note that there is some basic tests for this at https://github.com/riptano/cassandra-dtest/blob/master/cql_tests.py (set_test, list_test and map_test) which have some SET operation and are working correctly. And btw, everyone is very much welcome to help adding/improving those tests.

          I probably did something extraordinary (schema borrowed from Jonathan's comment from 04/Jun/12)

          CREATE TABLE foo(
            k uuid PRIMARY KEY,
            L list<int>,
            M map<text, int>,
            S set<int>
          );
          
          UPDATE ks.foo SET L = [1, 3, 5] WHERE k = 'b017f48f-ae67-11e1-9096-005056c00008';
          UPDATE ks.foo SET L = L + [7, 11, 13] WHERE k = 'b017f48f-ae67-11e1-9096-005056c00008';
          UPDATE ks.foo SET S = {1, 3, 5} WHERE k = 'b017f48f-ae67-11e1-9096-005056c00009';
          UPDATE ks.foo SET S = S + {7, 11, 13} WHERE k = 'b017f48f-ae67-11e1-9096-005056c00009';
          
          Show
          Pavel Yaskevich added a comment - What would be useful is the query used to trigger that exception. I'll note that there is some basic tests for this at https://github.com/riptano/cassandra-dtest/blob/master/cql_tests.py (set_test, list_test and map_test) which have some SET operation and are working correctly. And btw, everyone is very much welcome to help adding/improving those tests. I probably did something extraordinary (schema borrowed from Jonathan's comment from 04/Jun/12) CREATE TABLE foo( k uuid PRIMARY KEY, L list<int>, M map<text, int>, S set<int> ); UPDATE ks.foo SET L = [1, 3, 5] WHERE k = 'b017f48f-ae67-11e1-9096-005056c00008'; UPDATE ks.foo SET L = L + [7, 11, 13] WHERE k = 'b017f48f-ae67-11e1-9096-005056c00008'; UPDATE ks.foo SET S = {1, 3, 5} WHERE k = 'b017f48f-ae67-11e1-9096-005056c00009'; UPDATE ks.foo SET S = S + {7, 11, 13} WHERE k = 'b017f48f-ae67-11e1-9096-005056c00009';
          Hide
          Sylvain Lebresne added a comment -

          Attaching updated alternative patch with all problems fixed

          lgtm, committed.

          schema borrowed from Jonathan's comment from 04/Jun/12

          That was due to the fact that the patch hadn't be updated post CASSANDRA-4329. I've added the trivial fix for that with the commit of the refactor (and added the relevant test to dtests).

          Show
          Sylvain Lebresne added a comment - Attaching updated alternative patch with all problems fixed lgtm, committed. schema borrowed from Jonathan's comment from 04/Jun/12 That was due to the fact that the patch hadn't be updated post CASSANDRA-4329 . I've added the trivial fix for that with the commit of the refactor (and added the relevant test to dtests).
          Hide
          paul cannon added a comment -

          Just a small thing that would make cqlsh's life a little easier in trying to make sense of the 'comparator' field in system.schema_columnfamilies. The formatting in TypeParser.stringifyCollectionParameters is a bit backwards.

          Don't think this justifies a reopen.

          Show
          paul cannon added a comment - Just a small thing that would make cqlsh's life a little easier in trying to make sense of the 'comparator' field in system.schema_columnfamilies. The formatting in TypeParser.stringifyCollectionParameters is a bit backwards. Don't think this justifies a reopen.
          Hide
          Sylvain Lebresne added a comment -

          Thanks, committed as 22fbc4e413da1c61f6d0381be8cfa99.

          Show
          Sylvain Lebresne added a comment - Thanks, committed as 22fbc4e413da1c61f6d0381be8cfa99.

            People

            • Assignee:
              Sylvain Lebresne
              Reporter:
              Jonathan Ellis
              Reviewer:
              Jonathan Ellis
            • Votes:
              3 Vote for this issue
              Watchers:
              14 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development