Cassandra
  1. Cassandra
  2. CASSANDRA-4815

Make CQL work naturally with wide rows

    Details

    • Type: Wish Wish
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Not A Problem
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      I find that CQL3 is quite obtuse and does not provide me a language useful for accessing my data. First, lets point out how we should design Cassandra data.

      1) Denormalize
      2) Eliminate seeks
      3) Design for read
      4) optimize for blind writes

      So here is a schema that abides by these tried and tested rules large production uses are employing today.
      Say we have a table of movie objects:

      Movie
      Name
      Description
      -< tags (string)
      -< credits composite(role string, name string )
      -1 likesToday
      -1 blacklisted

      The above structure is a movie notice it hold a mix of static and dynamic columns, but the other all number of columns is not very large. (even if it was larger this is OK as well) Notice this table is not just
      a single one to many relationship, it has 1 to 1 data and it has two sets of 1 to many data.

      The schema today is declared something like this:

      create column family movies
      with default_comparator=UTF8Type and
      column_metadata =
      [

      {column_name: blacklisted, validation_class: int}

      ,

      {column_name: likestoday, validation_class: long}

      ,

      {column_name: description, validation_class: UTF8Type}

      ];

      We should be able to insert data like this:
      set ['Cassandra Database, not looking for a seQL']['blacklisted']=1;
      set ['Cassandra Database, not looking for a seQL']['likesToday']=34;
      set ['Cassandra Database, not looking for a seQL']['credits-dir']='director:asf';
      set ['Cassandra Database, not looking for a seQL']['credits-jir]='jiraguy:bob';
      set ['Cassandra Database, not looking for a seQL']['tags-action']='';
      set ['Cassandra Database, not looking for a seQL']['tags-adventure']='';
      set ['Cassandra Database, not looking for a seQL']['tags-romance']='';
      set ['Cassandra Database, not looking for a seQL']['tags-programming']='';

      This is the correct way to do it. 1 seek to find all the information related to a movie. As long as this row does
      not get "large" there is no reason to optimize by breaking data into other column families. (Notice you can not transpose this
      because movies is two 1-to-many relationships of potentially different types)

      Lets look at the CQL3 way to do this design:

      First, contrary to the original design of cassandra CQL does not like wide rows. It also does not have a good way to dealing with dynamic rows together with static rows either.

      You have two options:

      Option 1: lose all schema
      create table movies ( name string, column blob, value blob, primary key(name)) with compact storage.

      This method is not so hot we have not lost all our validators, and by the way you have to physically shutdown everything and rename files and recreate your schema if you want to inform cassandra that a current table should be compact. This could at very least be just a metadata change. Also you can not add column schema either.

      Option 2 Normalize (is even worse)

      create table movie (name String, description string, likestoday int, blacklisted int);
      create table movecredits( name string, role string, personname string, primary key(name,role) );
      create table movetags( name string, tag string, primary key (name,tag) );

      This is a terrible design, of the 4 key characteristics how cassandra data should be designed it fails 3:
      It does not:
      1) Denormalize
      2) Eliminate seeks
      3) Design for read

      Why is Cassandra steering toward this course, by making a language that does not understand wide rows?

      So what can be done? My suggestions:

      Cassandra needs to lose the COMPACT STORAGE conversions. Each table needs a
      "virtual view" that is compact storage with no work to migrate data and recreate schemas. Every table should have a compact view for the schemaless, or a simple query hint like /transposed/ should make this change.

      Metadata should be definable by regex. For example, all columnes named "tag*" are of type string.

      CQL should have the column[slice_start] .. column[slice_end] operator from cql2.

      CQL should support current users, users should not have to
      switch between CQL versions, and possibly thrift, to work with wide rows. The language should work for them even if
      it not expressly designed for them. Some of these features are already part of cql2 so they should be carried over.

      Also what needs to not happen is someone to make a hand waiving statement
      like "Once we have collection types we will not need wide rows". This request is to satisfy current users of cassandra not future ones or theoretical ones. Solutions should not involve physically migrating data in any way, they should not involve telling someone to do something they are already doing much differently. The suggestions should revolve around making the query language work well with existing data.

      1. table.png
        105 kB
        Edward Capriolo
      2. cql feature set updated.png
        126 kB
        Jonathan Ellis

        Issue Links

          Activity

          Edward Capriolo created issue -
          Edward Capriolo made changes -
          Field Original Value New Value
          Summary Make CQL3 work naturally with wide rows Make CQL work naturally with wide rows
          Hide
          Nick Bailey added a comment -

          Isn't this the main reason behind collections support?

          CREATE TABLE movies (
            movie_id int PRIMARY KEY,
            blacklisted int,
            credits map<text, text>,
            description text,
            likes_today int,
            name text,
            tags set<text>
          );
          
          Show
          Nick Bailey added a comment - Isn't this the main reason behind collections support? CREATE TABLE movies ( movie_id int PRIMARY KEY, blacklisted int, credits map<text, text>, description text, likes_today int, name text, tags set<text> );
          Hide
          T Jake Luciani added a comment -

          I agree CQL3 is a step towards requiring more "schema"... I think for a lot of people that's a good thing and others it's not.

          The core of the issue here IMO is not how can we change CQL3 to fit your use case. It's will CQL3 eventually be the only way to access Cassandra in N years or can we always rely on there being the old more schemaless API?

          Show
          T Jake Luciani added a comment - I agree CQL3 is a step towards requiring more "schema"... I think for a lot of people that's a good thing and others it's not. The core of the issue here IMO is not how can we change CQL3 to fit your use case. It's will CQL3 eventually be the only way to access Cassandra in N years or can we always rely on there being the old more schemaless API?
          Hide
          Edward Capriolo added a comment -

          As I mentioned towards the end of the ticket, this feature request is not to support future theoretical use cases, it is to support the dominant current use case. It is not just my use case. It is the use case that the Cassandra project originally advocated.

          http://www.slideshare.net/lomakin.andrey/apache-cassandra-part-1-principles-data-model
          slide 30
          -columns aren't fixed
          -columns can be sorted
          -columns can be queried for a certain range

          I am fine if Cassandra adds new features that benefit from more schema, I am fine with Cassandra adding collections and think these are a great idea. But I see no technical reason why CQL can't support both old and new use cases. This is especially disturbing since the project offers no eloquent way to get from now to the future. Switching to COMPACT STORAGE is a pain and rewriting all the data into a new collection based design is not necessarily a good use of resources.

          Someone once told me Avro was the future of Cassandra. I am asking for features to support the now.

          Show
          Edward Capriolo added a comment - As I mentioned towards the end of the ticket, this feature request is not to support future theoretical use cases, it is to support the dominant current use case. It is not just my use case. It is the use case that the Cassandra project originally advocated. http://www.slideshare.net/lomakin.andrey/apache-cassandra-part-1-principles-data-model slide 30 -columns aren't fixed -columns can be sorted -columns can be queried for a certain range I am fine if Cassandra adds new features that benefit from more schema, I am fine with Cassandra adding collections and think these are a great idea. But I see no technical reason why CQL can't support both old and new use cases. This is especially disturbing since the project offers no eloquent way to get from now to the future. Switching to COMPACT STORAGE is a pain and rewriting all the data into a new collection based design is not necessarily a good use of resources. Someone once told me Avro was the future of Cassandra. I am asking for features to support the now.
          Hide
          Jonathan Ellis added a comment - - edited

          Hi Ed,

          I wrote a longish blog post over at http://www.datastax.com/dev/blog/cql3-for-cassandra-experts showing how use cases like this are handled in CQL3, with no rewriting of data. Give that a read and let me know if you have further questions!

          (All the examples in that post, except for the one using Set, are from Cassandra 1.1.6 and are forwards-compatible with 1.2.)

          Show
          Jonathan Ellis added a comment - - edited Hi Ed, I wrote a longish blog post over at http://www.datastax.com/dev/blog/cql3-for-cassandra-experts showing how use cases like this are handled in CQL3, with no rewriting of data. Give that a read and let me know if you have further questions! (All the examples in that post, except for the one using Set , are from Cassandra 1.1.6 and are forwards-compatible with 1.2.)
          Hide
          Edward Capriolo added a comment -

          Thanks for building that Jonathan. It clears up a couple things. I still have some questions/possible feature requests.

          Can I create a schema less table?

          cqlsh:testkeyspace> create table simple (a varchar, primary key(a) );
          Bad Request: No definition found that is not part of the PRIMARY KEY
          
          Show
          Edward Capriolo added a comment - Thanks for building that Jonathan. It clears up a couple things. I still have some questions/possible feature requests. Can I create a schema less table? cqlsh:testkeyspace> create table simple (a varchar, primary key(a) ); Bad Request: No definition found that is not part of the PRIMARY KEY
          Hide
          Edward Capriolo added a comment -

          The syntax suggests you should be able to create a table that only has a primary key.

          CREATE TABLE <cfname> ( <colname> <type> PRIMARY KEY [,
                                          <colname> <type> [, ...]] )
                         [WITH <optionname> = <val> [AND <optionname> = <val> [...]]];
          

          I think a user SHOULD be able to do this, because cassandra can be schemaless CQL should provide a way to create this type of table. (since we can SELECT * from tables created from the CLI)

          Show
          Edward Capriolo added a comment - The syntax suggests you should be able to create a table that only has a primary key. CREATE TABLE <cfname> ( <colname> <type> PRIMARY KEY [, <colname> <type> [, ...]] ) [WITH <optionname> = <val> [AND <optionname> = <val> [...]]]; I think a user SHOULD be able to do this, because cassandra can be schemaless CQL should provide a way to create this type of table. (since we can SELECT * from tables created from the CLI)
          Hide
          Jeremy Hanna added a comment -

          I let Ed know that in 1.2 there was support for creating a table with only a primary key (thanks Patrick). He did ask a good question - is CQL3 going to be relatively set in stone in 1.2? If people implement to CQL3, that's not going to change is it?

          Show
          Jeremy Hanna added a comment - I let Ed know that in 1.2 there was support for creating a table with only a primary key (thanks Patrick). He did ask a good question - is CQL3 going to be relatively set in stone in 1.2? If people implement to CQL3, that's not going to change is it?
          Hide
          Edward Capriolo added a comment -

          This is possibly more nitpicky because it seems hard to do.

          create column family compositetry with key_validation_class=UTF8Type and comparator='CompositeType(UTF8Type,UTF8Type)';

          [default@testkeyspace] set compositetry ['a']['b:c']=UTF8('d');
          [default@testkeyspace] set compositetry ['a']['d:e']=UTF8('f');
          [default@testkeyspace] set compositetry ['a']['h:i']=UTF8('j');

          cqlsh:testkeyspace> select * from compositetry where key='a' and column1>='b' and column1<'h';
          key | column1 | column2 | value
          ------------------+------
          a | b | c | 64
          a | d | e | 66

          cqlsh:testkeyspace> select * from compositetry where key='a' and column1>='b' and column1<'h' and column2>='c';
          Bad Request: PRIMARY KEY part column2 cannot be restricted (preceding part column1 is either not restricted or by a non-EQ relation)
          Perhaps you meant to use CQL 2? Try using the -2 option when starting cqlsh.

          I guess this is slightly more difficult to express composite slices.

          Show
          Edward Capriolo added a comment - This is possibly more nitpicky because it seems hard to do. create column family compositetry with key_validation_class=UTF8Type and comparator='CompositeType(UTF8Type,UTF8Type)'; [default@testkeyspace] set compositetry ['a'] ['b:c'] =UTF8('d'); [default@testkeyspace] set compositetry ['a'] ['d:e'] =UTF8('f'); [default@testkeyspace] set compositetry ['a'] ['h:i'] =UTF8('j'); cqlsh:testkeyspace> select * from compositetry where key='a' and column1>='b' and column1<'h'; key | column1 | column2 | value ---- ------- ------- + ------ a | b | c | 64 a | d | e | 66 cqlsh:testkeyspace> select * from compositetry where key='a' and column1>='b' and column1<'h' and column2>='c'; Bad Request: PRIMARY KEY part column2 cannot be restricted (preceding part column1 is either not restricted or by a non-EQ relation) Perhaps you meant to use CQL 2? Try using the -2 option when starting cqlsh. I guess this is slightly more difficult to express composite slices.
          Hide
          Edward Capriolo added a comment -

          Also from the article this statement:

          For the song tags, we have two choices. If we need to be compatible with data from an old-style schema, we can do that as follows:

          CREATE TABLE song_tags (
          id uuid,
          tag_name text,
          PRIMARY KEY (id, tag_name)
          );

          What does this mean? 'We can do that'. If we have an old style schema don't we need to be able to alter a current table. Which can't be done.

          cqlsh:testkeyspace> CREATE TABLE song_tags ( id uuid, tag_name text, b text, PRIMARY KEY (id, tag_name) );
          Bad Request: org.apache.cassandra.config.ConfigurationException: Cannot add already existing column family 'song_tags' to keyspace 'testkeyspace'.

          This is why I suggest VIEW tables make sense. All the CQL2 / CQL3 tables look like logical constructs on top of physical column families. Maybe defining multiple logic tables storing data to the same physical ones is the best bet long term.

          Show
          Edward Capriolo added a comment - Also from the article this statement: For the song tags, we have two choices. If we need to be compatible with data from an old-style schema, we can do that as follows: CREATE TABLE song_tags ( id uuid, tag_name text, PRIMARY KEY (id, tag_name) ); What does this mean? 'We can do that'. If we have an old style schema don't we need to be able to alter a current table. Which can't be done. cqlsh:testkeyspace> CREATE TABLE song_tags ( id uuid, tag_name text, b text, PRIMARY KEY (id, tag_name) ); Bad Request: org.apache.cassandra.config.ConfigurationException: Cannot add already existing column family 'song_tags' to keyspace 'testkeyspace'. This is why I suggest VIEW tables make sense. All the CQL2 / CQL3 tables look like logical constructs on top of physical column families. Maybe defining multiple logic tables storing data to the same physical ones is the best bet long term.
          Hide
          Edward Capriolo added a comment -

          Also without reading much of the background tickets I am pretty curious as about the syntax

          PRIMARY KEY (id, tag_name)
          

          What is going to happen if Cassandra and the CQL language actually adds true composite row keys?

          Show
          Edward Capriolo added a comment - Also without reading much of the background tickets I am pretty curious as about the syntax PRIMARY KEY (id, tag_name) What is going to happen if Cassandra and the CQL language actually adds true composite row keys?
          Hide
          Edward Capriolo added a comment -

          So in Cassandra 1.2.0 you can create a table with no columns other then the primary key....

          cqlsh:testkeyspace> create table sample ( keycolumn varchar, primary key (keycolumn) );
          

          But you can not insert to it.
          cqlsh:testkeyspace> insert into sample ( keycolumn, 'age' ) values ('ed','30') ;

          And rather surprisingly it creates a table with metadata I did not ask for. It assumes the comparator is a composite of a single UTF8Type

          create column family sample
            with column_type = 'Standard'
            and comparator = 'CompositeType(org.apache.cassandra.db.marshal.UTF8Type)'
            and default_validation_class = 'UTF8Type'
          

          Not what I was going for

          Show
          Edward Capriolo added a comment - So in Cassandra 1.2.0 you can create a table with no columns other then the primary key.... cqlsh:testkeyspace> create table sample ( keycolumn varchar, primary key (keycolumn) ); But you can not insert to it. cqlsh:testkeyspace> insert into sample ( keycolumn, 'age' ) values ('ed','30') ; And rather surprisingly it creates a table with metadata I did not ask for. It assumes the comparator is a composite of a single UTF8Type create column family sample with column_type = 'Standard' and comparator = 'CompositeType(org.apache.cassandra.db.marshal.UTF8Type)' and default_validation_class = 'UTF8Type' Not what I was going for
          Hide
          Nick Bailey added a comment -

          What is going to happen if Cassandra and the CQL language actually adds true composite row keys?

          CASSANDRA-4179

          Show
          Nick Bailey added a comment - What is going to happen if Cassandra and the CQL language actually adds true composite row keys? CASSANDRA-4179
          Hide
          Edward Capriolo added a comment -

          Working with CQL today another idea came to me. Does it make sense to implement CLI like SET and GET? SET and GET are actually fairly natural ways to work with schema-less cassandra. Also in terms of performance a CLI set statement is smaller then the equivalent insert into. This would serve a a no nonsense way to get data into a CF.

          Show
          Edward Capriolo added a comment - Working with CQL today another idea came to me. Does it make sense to implement CLI like SET and GET? SET and GET are actually fairly natural ways to work with schema-less cassandra. Also in terms of performance a CLI set statement is smaller then the equivalent insert into. This would serve a a no nonsense way to get data into a CF.
          Hide
          Sylvain Lebresne added a comment -

          Can I create a schema less table?

          Yes. The following as-schemaless-as-can-possibly-be thrift/cli definition:

          create column family schemaless
            with key_validation_class = BytesType
            and comparator = BytesType
            and default_validation_class = BytesType
          

          is equivalent to the following CQL3 definition

          CREATE TABLE schemaless (
            key blob,
            column blob,
            value blob,
            PRIMARY KEY (key, column)
          ) WITH COMPACT STORAGE
          

          And to be clear, when I say equivalent, I mean equivalent. If you create the first definion above, you can use the column family in CQL3 as if it was defined by the second definition (as in, you don't have to do the CREATE TABLE itself), or you can create the table in CQL3 first with the second query and query it in thrift exactly as if it had been created by the first definition.

          The composite primary key is what tells CQL3 that it's a "transposed" wide CF. In other words, in CQL3, 'key' will map to the row key, 'column' will map to the internal column name and 'value' will map to the internal column value. I note that 'key', 'column' and 'value' are the default names that CQL3 picks for you when you haven't explicitely defined user friendlier one (in other words, when you upgrade from thrift). CASSANDRA-4822 is open to allow you to rename those default names to more user friendly ones if you so wish (and to be clear, doing so as no impact whatsoever on what is stored, it just declare the new names as CQL3 metadata).

          I guess this is slightly more difficult to express composite slices.

          It's possibly nitpicking, but I would talk of a difficulty in poperly paginating composites. But yes, that's one of the very few things that CQL3 is not currently very good at. But we'll fix it (and the good thing about having a query language is that it will be trivial to fix it without a backward incompatible breaking change). That being said, I do believe that once you start doing real life example, it's not really a blocker. Most of the time, when you use composites in real life, you want to slice over one of the component, which works fine. That's why it's really more a problem for slightly more complex pagination over composite wide rows. There is also CASSANDRA-4415 that will fix the need for a good part of the manual pagination people do right now.

          If we have an old style schema don't we need to be able to alter a current table.

          As explained above, "thrift" CF are directly accessible from CQL3 (without any redefinition, and that's why trying to create the table in CQL3 is not legal). However, you won't nice column names if you do so (but rather the 'key', 'column' and 'value' generic names above). Again, CASSANDRA-4822 will allow to declare nice names without having to do complex operation (like trashing your thrift schema so that CQL3 allow the redefinition).

          What is going to happen if Cassandra and the CQL language actually adds true composite row keys?

          It does already: CASSANDRA-4179. You just declare

          PRIMARY KEY ((id_part1, id_part2), tag_name).
          
          Show
          Sylvain Lebresne added a comment - Can I create a schema less table? Yes. The following as-schemaless-as-can-possibly-be thrift/cli definition: create column family schemaless with key_validation_class = BytesType and comparator = BytesType and default_validation_class = BytesType is equivalent to the following CQL3 definition CREATE TABLE schemaless ( key blob, column blob, value blob, PRIMARY KEY (key, column) ) WITH COMPACT STORAGE And to be clear, when I say equivalent, I mean equivalent. If you create the first definion above, you can use the column family in CQL3 as if it was defined by the second definition (as in, you don't have to do the CREATE TABLE itself), or you can create the table in CQL3 first with the second query and query it in thrift exactly as if it had been created by the first definition. The composite primary key is what tells CQL3 that it's a "transposed" wide CF. In other words, in CQL3, 'key' will map to the row key, 'column' will map to the internal column name and 'value' will map to the internal column value. I note that 'key', 'column' and 'value' are the default names that CQL3 picks for you when you haven't explicitely defined user friendlier one (in other words, when you upgrade from thrift). CASSANDRA-4822 is open to allow you to rename those default names to more user friendly ones if you so wish (and to be clear, doing so as no impact whatsoever on what is stored, it just declare the new names as CQL3 metadata). I guess this is slightly more difficult to express composite slices. It's possibly nitpicking, but I would talk of a difficulty in poperly paginating composites. But yes, that's one of the very few things that CQL3 is not currently very good at. But we'll fix it (and the good thing about having a query language is that it will be trivial to fix it without a backward incompatible breaking change). That being said, I do believe that once you start doing real life example, it's not really a blocker. Most of the time, when you use composites in real life, you want to slice over one of the component, which works fine. That's why it's really more a problem for slightly more complex pagination over composite wide rows. There is also CASSANDRA-4415 that will fix the need for a good part of the manual pagination people do right now. If we have an old style schema don't we need to be able to alter a current table. As explained above, "thrift" CF are directly accessible from CQL3 (without any redefinition, and that's why trying to create the table in CQL3 is not legal). However, you won't nice column names if you do so (but rather the 'key', 'column' and 'value' generic names above). Again, CASSANDRA-4822 will allow to declare nice names without having to do complex operation (like trashing your thrift schema so that CQL3 allow the redefinition). What is going to happen if Cassandra and the CQL language actually adds true composite row keys? It does already: CASSANDRA-4179 . You just declare PRIMARY KEY ((id_part1, id_part2), tag_name).
          Jonathan Ellis made changes -
          Comment [ These are good questions and I wanted to reach a wider audience than the Jira followers, so I wrote a blog post to address the questions here: http://www.datastax.com/dev/blog/cql3-for-cassandra-experts

          Please let me know if that clarifies things.

          (Note that all the examples there work in 1.1 as well as 1.2, with the exception of the cql3 CREATE and ALTER for song_tags. Those require 1.2. All the pasted output is in fact from 1.1.)
          ]
          Hide
          Jonathan Ellis added a comment -

          To add to that, if you want "mostly static columns, but some schemaless" then you can throw the "schemaless" ones in a Map. This will NOT be easily accessible from Thrift – but it's a good example of the kinds of things that cql3 makes easier.

          Show
          Jonathan Ellis added a comment - To add to that, if you want "mostly static columns, but some schemaless" then you can throw the "schemaless" ones in a Map. This will NOT be easily accessible from Thrift – but it's a good example of the kinds of things that cql3 makes easier.
          Hide
          Jonathan Ellis added a comment -

          What does this mean? 'We can do that'. If we have an old style schema don't we need to be able to alter a current table.

          Only if you want to add meaningful names. That's what this next part is saying:

          "If we simply use the old schema directly as-is, Cassandra will give cell names and values autogenerated CQL3 names: column1, column2, and so forth. Here I’m accessing the data inserted earlier from CQL2, but with cqlsh --cql3:"

          SELECT * FROM song_tags;
          
          id                                   | column1 | value
          --------------------------------------+---------+-------
          8a172618-b121-4136-bb10-f665cfc469eb |    2007 |
          8a172618-b121-4136-bb10-f665cfc469eb |  covers |
          a3e64f8f-bd44-4f28-b8d9-6938726e34d4 |    1973 |
          a3e64f8f-bd44-4f28-b8d9-6938726e34d4 |   blues |
          

          ... that said, as Sylvain points out we do have CASSANDRA-4822 open to allow changing those default names without dropping and recreating the table definition.

          Does it make sense to implement CLI like SET and GET?

          Not in CQL-the-language, and I don't think even in cqlsh-the-utility. I understand the appeal of the convenience, but the abstraction leakage it would introduce threatens to undo all the work we're doing to make CQL3 something you can use on its own terms.

          (As far as performance goes, prepared statements make the length of the string being parsed initially a non-issue.)

          Show
          Jonathan Ellis added a comment - What does this mean? 'We can do that'. If we have an old style schema don't we need to be able to alter a current table. Only if you want to add meaningful names. That's what this next part is saying: "If we simply use the old schema directly as-is, Cassandra will give cell names and values autogenerated CQL3 names: column1, column2, and so forth. Here I’m accessing the data inserted earlier from CQL2, but with cqlsh --cql3:" SELECT * FROM song_tags; id | column1 | value --------------------------------------+---------+------- 8a172618-b121-4136-bb10-f665cfc469eb | 2007 | 8a172618-b121-4136-bb10-f665cfc469eb | covers | a3e64f8f-bd44-4f28-b8d9-6938726e34d4 | 1973 | a3e64f8f-bd44-4f28-b8d9-6938726e34d4 | blues | ... that said, as Sylvain points out we do have CASSANDRA-4822 open to allow changing those default names without dropping and recreating the table definition. Does it make sense to implement CLI like SET and GET? Not in CQL-the-language, and I don't think even in cqlsh-the-utility. I understand the appeal of the convenience, but the abstraction leakage it would introduce threatens to undo all the work we're doing to make CQL3 something you can use on its own terms. (As far as performance goes, prepared statements make the length of the string being parsed initially a non-issue.)
          Hide
          Edward Capriolo added a comment -
          Not in CQL-the-language, and I don't think even in cqlsh-the-utility. I understand the appeal of the convenience, but the abstraction leakage it would introduce threatens to undo all the work we're doing to make CQL3 something you can use on its own terms.

          But what if I want to think of Cassandra as a memcache not a relational database. This is one of my ticket points, CQL should support all the use cases it can. You are calling it abstraction leakage but I think of it as a natural way with working with Cassandra. But I do agree that SELECTS are better then cli 'get' in most cases. What is missing is the SET side.

          CREATE TABLE schemaless (
            key blob,
            column blob,
            value blob,
            PRIMARY KEY (key, column)
          ) WITH COMPACT STORAGE
          

          Now to insert into this table I need to format everything as hex.

          INSERT INTO SCHEMALESS (key,column,value) VALUES ('HEX','HEX','HEX');

          The CLI has many useful functions like ascii(' '), or utf8(' '). Assume does not seem to have an effect here. This is discussed in CASSANDRA-3799.

          Show
          Edward Capriolo added a comment - Not in CQL-the-language, and I don't think even in cqlsh-the-utility. I understand the appeal of the convenience, but the abstraction leakage it would introduce threatens to undo all the work we're doing to make CQL3 something you can use on its own terms. But what if I want to think of Cassandra as a memcache not a relational database. This is one of my ticket points, CQL should support all the use cases it can. You are calling it abstraction leakage but I think of it as a natural way with working with Cassandra. But I do agree that SELECTS are better then cli 'get' in most cases. What is missing is the SET side. CREATE TABLE schemaless ( key blob, column blob, value blob, PRIMARY KEY (key, column) ) WITH COMPACT STORAGE Now to insert into this table I need to format everything as hex. INSERT INTO SCHEMALESS (key,column,value) VALUES ('HEX','HEX','HEX'); The CLI has many useful functions like ascii(' '), or utf8(' '). Assume does not seem to have an effect here. This is discussed in CASSANDRA-3799 .
          Edward Capriolo made changes -
          Link This issue relates to CASSANDRA-3799 [ CASSANDRA-3799 ]
          Hide
          Edward Capriolo added a comment -

          Functions like ascii() which are nice for type conversion.

          Show
          Edward Capriolo added a comment - Functions like ascii() which are nice for type conversion.
          Hide
          Sylvain Lebresne added a comment -

          Now to insert into this table I need to format everything as hex

          Not for prepared statement where all the value will be in binary. What I mean here is that as far as CQL-the-language is concerned, you can absolutely use it to think of Cassandra as a memcache (in fact, I'd say that the hard would be to think of Cassandra as a relational database because it's not a relational database, at least not a full blown one, and CQL don't change that).

          Now if the remark is that it's less convenient to work with blobs in cqlsh than it was with the cli, then I can agree to that and I'm fine trying to fix it, but let's maybe keep that to CASSANDRA-3799.

          Show
          Sylvain Lebresne added a comment - Now to insert into this table I need to format everything as hex Not for prepared statement where all the value will be in binary. What I mean here is that as far as CQL-the-language is concerned, you can absolutely use it to think of Cassandra as a memcache (in fact, I'd say that the hard would be to think of Cassandra as a relational database because it's not a relational database, at least not a full blown one, and CQL don't change that). Now if the remark is that it's less convenient to work with blobs in cqlsh than it was with the cli, then I can agree to that and I'm fine trying to fix it, but let's maybe keep that to CASSANDRA-3799 .
          Hide
          Edward Capriolo added a comment -

          For reference this is a table of what each of the "clients" have.

          Show
          Edward Capriolo added a comment - For reference this is a table of what each of the "clients" have.
          Edward Capriolo made changes -
          Attachment table.png [ 12550473 ]
          Hide
          Jonathan Ellis added a comment -

          Updated chart attached.

          SET and GET are syntactic features and are out of place in a functionality discussion.

          Show
          Jonathan Ellis added a comment - Updated chart attached. SET and GET are syntactic features and are out of place in a functionality discussion.
          Jonathan Ellis made changes -
          Attachment cql feature set updated.png [ 12550479 ]
          Jonathan Ellis made changes -
          Attachment cql feature set updated.png [ 12550479 ]
          Jonathan Ellis made changes -
          Attachment cql feature set updated.png [ 12550481 ]
          Hide
          Edward Capriolo added a comment -

          @Jonathan agreed. My main concern is that a user can set schema-less columns. This currently looks possible with compact storage tables but not possible with non-compact storage tables.

          I attached that table to show what features a user has with one client vs the other. I am not necessarily arguing that CQL should have a given feature in that table. I was only trying to show that based on the client users have access to some features and not others.

          Also I wanted to highlight how all the different clients have strengths and deficiencies. Internally I have to "sell things to people" and I just wanted to show the CQL and CQLsh are week in comparison to the CLI for schema-less columns.

          For my QA person as an example, they learned how set and assume worked in the CLI and the had functions like ascii(). These things are missing and people are effected.

          So my table is not to say all the things CQL should support just to show the reality of what users are faced with.

          Show
          Edward Capriolo added a comment - @Jonathan agreed. My main concern is that a user can set schema-less columns. This currently looks possible with compact storage tables but not possible with non-compact storage tables. I attached that table to show what features a user has with one client vs the other. I am not necessarily arguing that CQL should have a given feature in that table. I was only trying to show that based on the client users have access to some features and not others. Also I wanted to highlight how all the different clients have strengths and deficiencies. Internally I have to "sell things to people" and I just wanted to show the CQL and CQLsh are week in comparison to the CLI for schema-less columns. For my QA person as an example, they learned how set and assume worked in the CLI and the had functions like ascii(). These things are missing and people are effected. So my table is not to say all the things CQL should support just to show the reality of what users are faced with.
          Hide
          Edward Capriolo added a comment -

          @Jonathan
          I disagree with the attachment. For "slice composites" in CQL3 you have 'YES'. But with compact storage we can only slice based on the first value of the composite. This is why I said 'KINDA' because a composite might be a very wide row. Thus if the first value of the composite has 100,000 values equal to 5 and then the second part of the composite has high cardinality that can not be sliced effectively.

          The way I would say this is "CQL3 can effectively slice the composites it created in schema-full tables, CQL3 can slice only on the first column of composite in a schema-less table".

          Sylvain agreed with this above

          It's possibly nitpicking, but I would talk of a difficulty in poperly paginating composites. But yes, that's one of the very few things that CQL3 is not currently very good at. But we'll fix it (and the good thing about having a query language is that it will be trivial to fix it without a backward incompatible breaking change).

          Show
          Edward Capriolo added a comment - @Jonathan I disagree with the attachment. For "slice composites" in CQL3 you have 'YES'. But with compact storage we can only slice based on the first value of the composite. This is why I said 'KINDA' because a composite might be a very wide row. Thus if the first value of the composite has 100,000 values equal to 5 and then the second part of the composite has high cardinality that can not be sliced effectively. The way I would say this is "CQL3 can effectively slice the composites it created in schema-full tables, CQL3 can slice only on the first column of composite in a schema-less table". Sylvain agreed with this above It's possibly nitpicking, but I would talk of a difficulty in poperly paginating composites. But yes, that's one of the very few things that CQL3 is not currently very good at. But we'll fix it (and the good thing about having a query language is that it will be trivial to fix it without a backward incompatible breaking change).
          Hide
          Sylvain Lebresne added a comment -

          Though there is some truth about slice on composites currently having a few limitations in CQL3, "CQL3 can slice only on the first column of composite" is not true either (regardless of it being a schema-less or a compact/non-compact table). I've just created CASSANDRA-4851 to lift that limitation (and as I explain in that ticket, you can slice any component, not only the first one, but you cannot page simultaneously on both in a way, which imo is only useful for pagination in real life). Nevertheless, it is a current limitation, but let it be clear that we intend to fix it.

          I have the feeling that there is a misunderstanding in that some seem to believe that we intend to limit the possible use case for Cassandra with CQL3. That is absolutely not the case. In fact, aside for CASSANDRA-4851 (which I think is fairly specific) and creating a secondary index on a specific column of a wide row (feature that I've only ever see one person using, and even he agree that was kind of a hack and for which CASSANDRA-3782 is open nonetheless), I'm not aware of any use cases that thrift support but CQL3 doesn't. And when I say that, I'm including things as dynamic as using DynamicCompositeType (that I don't particularly encourage anyone to use btw, I'm still looking for a compelling use case where it is truly necessary). That is, CQL3 doesn't provide any nice syntax to work with DynamicCompositeType, but you can still use it the same way you do in thrift (the syntax will be pretty much as convenient as in thrift, that is not very convenient at all, but you can do it and it's not worth than in thrift).

          Show
          Sylvain Lebresne added a comment - Though there is some truth about slice on composites currently having a few limitations in CQL3, "CQL3 can slice only on the first column of composite" is not true either (regardless of it being a schema-less or a compact/non-compact table). I've just created CASSANDRA-4851 to lift that limitation (and as I explain in that ticket, you can slice any component, not only the first one, but you cannot page simultaneously on both in a way, which imo is only useful for pagination in real life). Nevertheless, it is a current limitation, but let it be clear that we intend to fix it. I have the feeling that there is a misunderstanding in that some seem to believe that we intend to limit the possible use case for Cassandra with CQL3. That is absolutely not the case. In fact, aside for CASSANDRA-4851 (which I think is fairly specific) and creating a secondary index on a specific column of a wide row (feature that I've only ever see one person using, and even he agree that was kind of a hack and for which CASSANDRA-3782 is open nonetheless), I'm not aware of any use cases that thrift support but CQL3 doesn't. And when I say that, I'm including things as dynamic as using DynamicCompositeType (that I don't particularly encourage anyone to use btw, I'm still looking for a compelling use case where it is truly necessary). That is, CQL3 doesn't provide any nice syntax to work with DynamicCompositeType, but you can still use it the same way you do in thrift (the syntax will be pretty much as convenient as in thrift, that is not very convenient at all, but you can do it and it's not worth than in thrift).
          Hide
          Edward Capriolo added a comment - - edited

          I do not think set and get are syntactic features that should be out of this discussion. I was doing some blogging this weekend and came to the re-realization that, BigTable just provides a simple low level API. Being that Cassandra is based on bigtable, it is strange to argue that simple set has no place, and that everything needs to be a query.

          Thinking further into this I think the new transport only being able to execute CQL queries is a huge defect. We are going to continually have these discussions about what we can and can't do in CQL, that we can do in thrift.

          We should not have to spend time designing CQL features to solve impedance mismatches between RPC and query languages, and we should not be redesigning Cassandra so every operation fits into a CQL language.

          We have to face a reality, it is going to be quite awkward for to clients to maintain multiple connection pools for client requests, 1 for thrift, one for cql2, and one for cql3, one for cql4, etc. The new transport should be able to piggyback thrift requests somehow, this way a user only needs to maintain a single client connection.

          Show
          Edward Capriolo added a comment - - edited I do not think set and get are syntactic features that should be out of this discussion. I was doing some blogging this weekend and came to the re-realization that, BigTable just provides a simple low level API. Being that Cassandra is based on bigtable, it is strange to argue that simple set has no place, and that everything needs to be a query. Thinking further into this I think the new transport only being able to execute CQL queries is a huge defect. We are going to continually have these discussions about what we can and can't do in CQL, that we can do in thrift. We should not have to spend time designing CQL features to solve impedance mismatches between RPC and query languages, and we should not be redesigning Cassandra so every operation fits into a CQL language. We have to face a reality, it is going to be quite awkward for to clients to maintain multiple connection pools for client requests, 1 for thrift, one for cql2, and one for cql3, one for cql4, etc. The new transport should be able to piggyback thrift requests somehow, this way a user only needs to maintain a single client connection.
          Hide
          Jonathan Ellis added a comment -

          I'm still not sure we're on the same page as far as GET and SET go.

          I'm saying that functionally, if you have

          create column family test;
          

          (all cli defaults – everything is bytes), then

          set test['ff']['dd'] = 'cc';
          

          in the cli (translation to Thrift left as an exercise for the reader) is EXACTLY the same as

          insert into test(key, column1, value) values ('ff', 'dd', 'cc');
          

          in cql.

          If you think we're missing functionality here then let's clear that up. But if you're hung up on the syntax then we'll have to agree to disagree.

          Show
          Jonathan Ellis added a comment - I'm still not sure we're on the same page as far as GET and SET go. I'm saying that functionally, if you have create column family test; (all cli defaults – everything is bytes), then set test['ff']['dd'] = 'cc'; in the cli (translation to Thrift left as an exercise for the reader) is EXACTLY the same as insert into test(key, column1, value) values ('ff', 'dd', 'cc'); in cql. If you think we're missing functionality here then let's clear that up. But if you're hung up on the syntax then we'll have to agree to disagree.
          Gavin made changes -
          Workflow no-reopen-closed, patch-avail [ 12729956 ] patch-available, re-open possible [ 12753520 ]
          Gavin made changes -
          Workflow patch-available, re-open possible [ 12753520 ] reopen-resolved, no closed status, patch-avail, testing [ 12756719 ]
          Jonathan Ellis made changes -
          Status Open [ 1 ] Resolved [ 5 ]
          Resolution Not A Problem [ 8 ]
          Transition Time In Source Status Execution Times Last Executer Last Execution Date
          Open Open Resolved Resolved
          132d 8h 32m 1 Jonathan Ellis 25/Feb/13 22:48

            People

            • Assignee:
              Unassigned
              Reporter:
              Edward Capriolo
            • Votes:
              2 Vote for this issue
              Watchers:
              11 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development