Details

    • Type: New Feature New Feature
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Fix Version/s: 2.1 beta1
    • Component/s: API, Core
    • Labels:
      None

      Description

      A typical use case for a collection could be to store a bunch of addresses in a user profile. An address could typically be composed of a few properties: say a street, a city, a postal code and maybe a few phone numbers associated to it.

      To model that currently with collections, you might use a map<string, blob>, where the map key could be a string identifying the address, and the value would be all the infos of an address serialized manually (you can use text instead of blob and shove everything in a string if you prefer but the principle is the same).

      This ticket suggests to make this more user friendly by allowing:

      CREATE TYPE address (
        street text,
        city text,
        zip_code int,
        phones set<text>
      )
      
      CREATE TABLE users (
        id uuid PRIMARY KEY,
        name text,
        addresses map<string, address>
      )
      

      Under the hood, that type declaration would just be metadata on top of CompositeType (which does mean a limitation would be that we wouldn't allow re-ordering or removal of fields in a custom TYPE). Namely, the address type would be in practice a CompositeType(UTF8Type, UTF8Type, Int32Type, SetType(UTF8Type)) + some metadata that records the name of each component. In other words, this would mostly be user-friendly syntactic sugar to create composite blobs.

      I'll note that this would also be useful outside collections, as it might sometimes be more efficient/useful to have such simple composite blob. For instance, you could imagine to have a:

      CREATE TYPE fullname (
        firstname text,
        lastname text
      )
      

      and to rewrite the users table above as

      CREATE TABLE users (
        id uuid PRIMARY KEY,
        name fullname,
        addresses map<string, address>
      )
      

      In terms of inserts we'd need a syntax for those new "struct". Could be:

      INSERT INTO users (id, name)
                 VALUES (2ad..., { firstname: 'Paul', lastname: 'smith'});
      UPDATE users
         SET addresses = address + { 'home': { street: '...', city: 'SF', zip_code: 94102, phones: {} } }
         WHERE id=2ad...;
      

      where the difference with a map is that the "key" would be a column name (in the CQL3 sense), not a value/literal. Though we might find that a bit confusing and find some other syntax.

      On the query side, we could optionally allow things like:

      SELECT name.firstname, name.lastname FROM users WHERE id=2ad...;
      

      One open question however is what type do we send back in the result set
      for a query like:

      SELECT name FROM users WHERE id=2ad...;
      

      We could:

      1. return just that it's the user defined type named address, but that imply the client has to query the cluster metadata to find out the definition of the type.
      2. return the full definition of the type every time.

      I also note that client side, it might be a tad harder to support such types cleanly in statically type languages than in dynamically typed ones, but that's not the end of the world either.

        Issue Links

          Activity

          Sylvain Lebresne created issue -
          Jonathan Ellis made changes -
          Field Original Value New Value
          Fix Version/s 2.1 [ 12324159 ]
          Component/s API [ 12313742 ]
          Component/s Core [ 12312978 ]
          Sylvain Lebresne made changes -
          Assignee Sylvain Lebresne [ slebresne ]
          Hide
          Aleksey Yeschenko added a comment -

          Re: reserved types.

          Suggesting the following list, or some subset of it:

          • byte
          • smallint (2bytes)
          • complex
          • enum
          • money
          • date (just the date, no time, 4 bytes)
          • interval (time)
          • macaddr
          • bitstring

          Geometric types, if we ever decide to go that way:

          • point
          • line
          • lseg
          • box
          • path
          • polygon
          • circle
          Show
          Aleksey Yeschenko added a comment - Re: reserved types. Suggesting the following list, or some subset of it: byte smallint (2bytes) complex enum money date (just the date, no time, 4 bytes) interval (time) macaddr bitstring Geometric types, if we ever decide to go that way: point line lseg box path polygon circle
          Hide
          Sylvain Lebresne added a comment -

          Attaching patch for this at https://github.com/pcmanus/cassandra/commits/5590 (5 last patches).

          The syntax is the one of the description above (including the syntax to select specific fields). The user types are global (they are not keyspace scoped for instance; figured scoping them would be annoying in practice without any benefits that I can see).

          Internally, the cell value for a user type uses the same format than CompositeType (a feature) and the new 'UserType' AbstractType is in fact just a subtype of CompositeType with additional metadata. This makes it possible for users using a CompositeType as a CQL3 column value today to update to a user type later.

          The patches implement 3 new CQL3 statements: CREATE TYPE, ALTER TYPE and DROP TYPE. DROP only lets you drop types that are not in use in any existing table. ALTER allows to:

          1. rename fields
          2. alter the type of an existing fields (provided the new type is compatible with the old one).
          3. rename the type itself
          4. append new fields to an existing types: we can't allow dropping fields however, so it's a one-way street.

          I'll note that currently we send the full user type definition in resultSet metadata like we do for custom types. Meaning a pretty long string with fully qualified java class names in it. It's not a hug deal in itself since the v2 protocol allows to skip said metadata for prepared statement (and for non-prepared statement in fact provided the client have some means to figure out the metadata on its own). That being said, we should add some special code for user types in the native protocol to make this a lot more compact, but that means starting the v3 version of the protocol and that can probably be left to a followup ticket.

          The patches also don't update cqlsh to make it understand user type values so they currently show as blobs. Also can be done in a followup ticket imo.

          Note: the last patch makes the name of aleksey first list above not available for user type (in provision of some potential future use). I haven't added the geometric ones though, because I'm not too keen on the idea of doing geometric stuffs in core Cassandra ever (but if someone feels strongly we should reserve those nonetheless I'm not going to fight on this).

          Show
          Sylvain Lebresne added a comment - Attaching patch for this at https://github.com/pcmanus/cassandra/commits/5590 (5 last patches). The syntax is the one of the description above (including the syntax to select specific fields). The user types are global (they are not keyspace scoped for instance; figured scoping them would be annoying in practice without any benefits that I can see). Internally, the cell value for a user type uses the same format than CompositeType (a feature) and the new 'UserType' AbstractType is in fact just a subtype of CompositeType with additional metadata. This makes it possible for users using a CompositeType as a CQL3 column value today to update to a user type later. The patches implement 3 new CQL3 statements: CREATE TYPE, ALTER TYPE and DROP TYPE. DROP only lets you drop types that are not in use in any existing table. ALTER allows to: rename fields alter the type of an existing fields (provided the new type is compatible with the old one). rename the type itself append new fields to an existing types: we can't allow dropping fields however, so it's a one-way street. I'll note that currently we send the full user type definition in resultSet metadata like we do for custom types. Meaning a pretty long string with fully qualified java class names in it. It's not a hug deal in itself since the v2 protocol allows to skip said metadata for prepared statement (and for non-prepared statement in fact provided the client have some means to figure out the metadata on its own). That being said, we should add some special code for user types in the native protocol to make this a lot more compact, but that means starting the v3 version of the protocol and that can probably be left to a followup ticket. The patches also don't update cqlsh to make it understand user type values so they currently show as blobs. Also can be done in a followup ticket imo. Note: the last patch makes the name of aleksey first list above not available for user type (in provision of some potential future use). I haven't added the geometric ones though, because I'm not too keen on the idea of doing geometric stuffs in core Cassandra ever (but if someone feels strongly we should reserve those nonetheless I'm not going to fight on this).
          Hide
          Jonathan Ellis added a comment -

          The syntax is the one of the description above

          I should have commented sooner, sorry for that. But I'm not sold on json-for-updates, dotted-fields-for-queries. I'd prefer using one or the other, e.g. for dotted-fields-everywhere,

          INSERT INTO users (id, name.firstname, name.lastname)
                     VALUES (2ad..., 'Paul', 'smith');
          
          Show
          Jonathan Ellis added a comment - The syntax is the one of the description above I should have commented sooner, sorry for that. But I'm not sold on json-for-updates, dotted-fields-for-queries. I'd prefer using one or the other, e.g. for dotted-fields-everywhere, INSERT INTO users (id, name.firstname, name.lastname) VALUES (2ad..., 'Paul', 'smith');
          Hide
          Sylvain Lebresne added a comment -

          We can't have a "just" the dotted notation because that doesn't work inside collections (which are a primary motivation for this). So we do need a 'compound value literal' notation (we can debate on whether the current one is fine or not of course; I do happen to like the current one all right though). We'll need such literal notation in cqlsh (and more generally documentation/slides) too to display results of SELECT name FROM foo and name is a user type value.

          So if anything, I think we can drop the dot notation on selects as that one is really just a convenience. But tbh, I don't see why we wouldn't allow such convenience. I really doubt this could confuse anyone at least, as it seems to me those notations are very standard.

          Note that I wouldn't be totally opposed on adding additional support for the dot notation on inserts (provided all fields are indeed given) for symmetry/convenience. Though as far as personal preference go, I could live without it (seems less useful to me than in selects).

          Show
          Sylvain Lebresne added a comment - We can't have a "just" the dotted notation because that doesn't work inside collections (which are a primary motivation for this). So we do need a 'compound value literal' notation (we can debate on whether the current one is fine or not of course; I do happen to like the current one all right though). We'll need such literal notation in cqlsh (and more generally documentation/slides) too to display results of SELECT name FROM foo and name is a user type value. So if anything, I think we can drop the dot notation on selects as that one is really just a convenience. But tbh, I don't see why we wouldn't allow such convenience. I really doubt this could confuse anyone at least, as it seems to me those notations are very standard. Note that I wouldn't be totally opposed on adding additional support for the dot notation on inserts (provided all fields are indeed given) for symmetry/convenience. Though as far as personal preference go, I could live without it (seems less useful to me than in selects).
          Hide
          Jonathan Ellis added a comment -

          We can't have a "just" the dotted notation because that doesn't work inside collections

          Well.

          Crap.

          I don't like that this makes it syntactically identical to a Map, but I don't have a better idea.

          Show
          Jonathan Ellis added a comment - We can't have a "just" the dotted notation because that doesn't work inside collections Well. Crap. I don't like that this makes it syntactically identical to a Map, but I don't have a better idea.
          Jonathan Ellis made changes -
          Reviewer Aleksey Yeschenko [ iamaleksey ]
          Jonathan Ellis made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          made changes -
          Status Patch Available [ 10002 ] Testing [ 10012 ]
          Brandon Williams made changes -
          Status Testing [ 10012 ] Open [ 1 ]
          Aleksey Yeschenko made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Hide
          Sylvain Lebresne added a comment -

          Pushed rebased version against current trunk at https://github.com/pcmanus/cassandra/commits/5590-v2

          Show
          Sylvain Lebresne added a comment - Pushed rebased version against current trunk at https://github.com/pcmanus/cassandra/commits/5590-v2
          Hide
          Aleksey Yeschenko added a comment -

          1. UserType.isCompatibleWith() has redundant logic - (duplicates CT.isCompatibleWith() except for the type name comparison bit)
          2. DefsTables.mergeTypes() 'extension' check is wrong for for equal-sized yet compatible user types (will simply pick u1 even if u2 is the most recent one, timestamp-wise)
          3. UserTypes.Literal.toString() is slightly wrong
          4. the cf name for user types should probably be schema_usertypes or schema_user_types - for consistency's sake (and to be able to continue to reference them as system.schema_* cfs)
          5. ALTER TYPE RENAME for the complete type seems to just make a copy of the original type with a new name. Should drop the original type. Also, should update all the user types that have the type that's being renamed, updated.. and all the cfs that were using that. IMHO the complexity is not worth it and we should just drop ALTER TYPE RENAME (the full type rename variant) entirely. There is always CREATE/DROP/ALTER if a user has to perform a rename, which should be rare.
          6. For auth, we should create a new Resource object and not reuse DataResouce (GRANT/REVOKE CREATE/ALTER/DROP ON (ALL TYPES|TYPE <name>) TO/FROM <user> - at least ALL TYPES for now). And/or check for ALTER on all the affected by a rename/add tables. Will create a separate ticket for that.

          Attaching a patch that corrects 1-4, with a dose of OCD on top (feel free to ignore the OCD part or the whole patch if you've got better versions in mind).

          Will also create tickets for cqlsh support (DESCRIBE, SELECT, completion for create/alter/drop type, updated grant/revoke, etc.)

          Show
          Aleksey Yeschenko added a comment - 1. UserType.isCompatibleWith() has redundant logic - (duplicates CT.isCompatibleWith() except for the type name comparison bit) 2. DefsTables.mergeTypes() 'extension' check is wrong for for equal-sized yet compatible user types (will simply pick u1 even if u2 is the most recent one, timestamp-wise) 3. UserTypes.Literal.toString() is slightly wrong 4. the cf name for user types should probably be schema_usertypes or schema_user_types - for consistency's sake (and to be able to continue to reference them as system.schema_* cfs) 5. ALTER TYPE RENAME for the complete type seems to just make a copy of the original type with a new name. Should drop the original type. Also, should update all the user types that have the type that's being renamed, updated.. and all the cfs that were using that. IMHO the complexity is not worth it and we should just drop ALTER TYPE RENAME (the full type rename variant) entirely. There is always CREATE/DROP/ALTER if a user has to perform a rename, which should be rare. 6. For auth, we should create a new Resource object and not reuse DataResouce (GRANT/REVOKE CREATE/ALTER/DROP ON (ALL TYPES|TYPE <name>) TO/FROM <user> - at least ALL TYPES for now). And/or check for ALTER on all the affected by a rename/add tables. Will create a separate ticket for that. Attaching a patch that corrects 1-4, with a dose of OCD on top (feel free to ignore the OCD part or the whole patch if you've got better versions in mind). Will also create tickets for cqlsh support (DESCRIBE, SELECT, completion for create/alter/drop type, updated grant/revoke, etc.)
          Aleksey Yeschenko made changes -
          Attachment ocd-and-corrections-patch.txt [ 12611383 ]
          Hide
          Aleksey Yeschenko added a comment -

          (I'm fine with the current checkAccess() implementations, so that's not a blocker).

          Show
          Aleksey Yeschenko added a comment - (I'm fine with the current checkAccess() implementations, so that's not a blocker).
          Hide
          Sylvain Lebresne added a comment -

          Pushed at https://github.com/pcmanus/cassandra/commits/5590-v3 a version rebased and with the following fixes:

          Attaching a patch that corrects 1-4

          Looks good, added to v3.

          ALTER TYPE RENAME for the complete type seems to just make a copy of the original type with a new name. Should drop the original type. Also, should update all the user types that have the type that's being renamed, updated.. and all the cfs that were using that

          We are already updating all the CF that have the modified type. Updating the other user types was indeed an oversight, but that's necessary for all the operations of ALTER TYPE anyway so added it. Remains dropping the original type but that's rather trivial. So overall I've kept type renaming but I've fixed it.

          For auth, we should create a new Resource object

          I wholeheartedly agree, but as you said, it's probably fine to tackle in a follow-up ticket.

          Show
          Sylvain Lebresne added a comment - Pushed at https://github.com/pcmanus/cassandra/commits/5590-v3 a version rebased and with the following fixes: Attaching a patch that corrects 1-4 Looks good, added to v3. ALTER TYPE RENAME for the complete type seems to just make a copy of the original type with a new name. Should drop the original type. Also, should update all the user types that have the type that's being renamed, updated.. and all the cfs that were using that We are already updating all the CF that have the modified type. Updating the other user types was indeed an oversight, but that's necessary for all the operations of ALTER TYPE anyway so added it. Remains dropping the original type but that's rather trivial. So overall I've kept type renaming but I've fixed it. For auth, we should create a new Resource object I wholeheartedly agree, but as you said, it's probably fine to tackle in a follow-up ticket.
          Hide
          Aleksey Yeschenko added a comment -

          +1

          Show
          Aleksey Yeschenko added a comment - +1
          Hide
          Sylvain Lebresne added a comment -

          Alright, committed, thanks

          Show
          Sylvain Lebresne added a comment - Alright, committed, thanks
          Sylvain Lebresne made changes -
          Status Patch Available [ 10002 ] Resolved [ 5 ]
          Resolution Fixed [ 1 ]
          Aleksey Yeschenko made changes -
          Link This issue relates to CASSANDRA-6304 [ CASSANDRA-6304 ]
          Aleksey Yeschenko made changes -
          Link This issue relates to CASSANDRA-6305 [ CASSANDRA-6305 ]
          Hide
          Aleksey Yeschenko added a comment -

          Created CASSANDRA-6305 to handle cqlsh and CASSANDRA-6304 to deal with authorization.

          Show
          Aleksey Yeschenko added a comment - Created CASSANDRA-6305 to handle cqlsh and CASSANDRA-6304 to deal with authorization.
          Ryan McGuire made changes -
          Link This issue relates to CASSANDRA-6312 [ CASSANDRA-6312 ]
          Hide
          Ryan McGuire added a comment -

          The patches also don't update cqlsh to make it understand user type values so they currently show as blobs. Also can be done in a followup ticket imo.

          Shouldn't the fix be in the driver, not in cqlsh? It's really hard to deal with a statement like this for instance:

          SELECT addresses FROM user WHERE id=2ad...;
          

          because addresses is a set we can't do the dot trick to get a single value.

          By the way, I committed a dtest that implements the example in this ticket, but it's missing some assertions around this until I can figure out how to deserialize the value contained in a collection.

          Show
          Ryan McGuire added a comment - The patches also don't update cqlsh to make it understand user type values so they currently show as blobs. Also can be done in a followup ticket imo. Shouldn't the fix be in the driver, not in cqlsh? It's really hard to deal with a statement like this for instance: SELECT addresses FROM user WHERE id=2ad...; because addresses is a set we can't do the dot trick to get a single value. By the way, I committed a dtest that implements the example in this ticket, but it's missing some assertions around this until I can figure out how to deserialize the value contained in a collection.
          Hide
          Aleksey Yeschenko added a comment -

          Shouldn't the fix be in the driver, not in cqlsh?

          cassandra-dbapi2 will likely not have another release. and python-driver, even if we switch to it tomorrow, doesn't support user types natively. So yes, it has to be done in cqlsh.

          Show
          Aleksey Yeschenko added a comment - Shouldn't the fix be in the driver, not in cqlsh? cassandra-dbapi2 will likely not have another release. and python-driver, even if we switch to it tomorrow, doesn't support user types natively. So yes, it has to be done in cqlsh.
          Hide
          Jeremiah Jordan added a comment -

          Or implement it in python-driver

          Show
          Jeremiah Jordan added a comment - Or implement it in python-driver
          Hide
          Aleksey Yeschenko added a comment -

          Eventually.

          Show
          Aleksey Yeschenko added a comment - Eventually.
          Aleksey Yeschenko made changes -
          Link This issue relates to CASSANDRA-6438 [ CASSANDRA-6438 ]
          Mikhail Stepura made changes -
          Link This issue relates to CASSANDRA-6705 [ CASSANDRA-6705 ]
          Sylvain Lebresne made changes -
          Fix Version/s 2.1 beta1 [ 12326275 ]
          Fix Version/s 2.1 [ 12324159 ]
          Hide
          DOAN DuyHai added a comment -

          Awesome feature again.

          It is possible to have nested custom types ? If yes, is there a depth limit other than the column max size to store the nested path (type1.type2.long) ?

          Show
          DOAN DuyHai added a comment - Awesome feature again. It is possible to have nested custom types ? If yes, is there a depth limit other than the column max size to store the nested path (type1.type2.long) ?
          Hide
          Aleksey Yeschenko added a comment -

          It is possible to have nested custom types ? If yes, is there a depth limit other than the column max size to store the nested path (type1.type2.long) ?

          Yes, it's possible. No, there is no limit.

          Show
          Aleksey Yeschenko added a comment - It is possible to have nested custom types ? If yes, is there a depth limit other than the column max size to store the nested path (type1.type2.long) ? Yes, it's possible. No, there is no limit.
          Hide
          DOAN DuyHai added a comment - - edited

          Another question about corner case.

          Currently CQL3 support list, set & map.

          The set structure respects a set contract, meaning that there is no duplicate. With primitive types like text, bigint, date ... it's quite obvious to define equality.

          Now with custom user type, how do you manage equality ? Suppose that I define the custom address type with several fields inside and I have a set<address>. How does Cassandra enforce unicity ? Do you compare field by field ?

          Show
          DOAN DuyHai added a comment - - edited Another question about corner case. Currently CQL3 support list, set & map. The set structure respects a set contract, meaning that there is no duplicate. With primitive types like text, bigint, date ... it's quite obvious to define equality. Now with custom user type, how do you manage equality ? Suppose that I define the custom address type with several fields inside and I have a set<address>. How does Cassandra enforce unicity ? Do you compare field by field ?

            People

            • Assignee:
              Sylvain Lebresne
              Reporter:
              Sylvain Lebresne
              Reviewer:
              Aleksey Yeschenko
            • Votes:
              1 Vote for this issue
              Watchers:
              20 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development