Uploaded image for project: 'Cassandra'
  1. Cassandra
  2. CASSANDRA-7423

Allow updating individual subfields of UDT

    Details

      Description

      Since user defined types were implemented in CASSANDRA-5590 as blobs (you have to rewrite the entire type in order to make any modifications), they can't be safely used without LWT for any operation that wants to modify a subset of the UDT's fields by any client process that is not authoritative for the entire blob.

      When trying to use UDTs to model complex records (particularly with nesting), this is not an exceptional circumstance, this is the totally expected normal situation.

      The use of UDTs for anything non-trivial is harmful to either performance or consistency or both.

      edit: to clarify, i believe that most potential uses of UDTs should be considered anti-patterns until/unless we have field-level r/w access to individual elements of the UDT, with individual timestamps and standard LWW semantics

        Issue Links

          Activity

          Hide
          thobbs Tyler Hobbs added a comment -

          Thanks, committed to trunk as 677230df694752c7ecf6d5459eee60ad7cf45ecf.

          Show
          thobbs Tyler Hobbs added a comment - Thanks, committed to trunk as 677230df694752c7ecf6d5459eee60ad7cf45ecf .
          Hide
          blerer Benjamin Lerer added a comment -

          Your change seems to have fixed the problem
          Thanks.

          +1

          Show
          blerer Benjamin Lerer added a comment - Your change seems to have fixed the problem Thanks. +1
          Hide
          thobbs Tyler Hobbs added a comment -

          can you add an entry to the CQL doc changelog and include a link to the ticket in that entry

          I did have an entry, but I've added the ticket number there as well.

          whereas I would have expected <identifier> on the following line (or even better: a b )

          Hmm, I can't reproduce getting <identifier> on the same line. Can we investigate that in another ticket? However, I have taken your suggestion of autocompleting the UDT fields where possible (update assignments and conditions).

          Show
          thobbs Tyler Hobbs added a comment - can you add an entry to the CQL doc changelog and include a link to the ticket in that entry I did have an entry, but I've added the ticket number there as well. whereas I would have expected <identifier> on the following line (or even better: a b ) Hmm, I can't reproduce getting <identifier> on the same line. Can we investigate that in another ticket? However, I have taken your suggestion of autocompleting the UDT fields where possible (update assignments and conditions).
          Hide
          blerer Benjamin Lerer added a comment - - edited

          I see a weird behavior when playing with auto-completion.

          cqlsh> create KEYSPACE test WITH replication = {'class': 'SimpleStrategy', 'replication_factor': 1};
          cqlsh> use test;
          cqlsh:test> create type t (a int, b int);
          cqlsh:test> create table test (pk int primary key, u t, v int);
          cqlsh:test> update test set  <TAB>
          u v
          cqlsh:test> update test set u <TAB>
          = .
          

          but if I press TAB after update test SET u. I get:

          cqlsh:test> update test SET u.<identifier>
          

          whereas I would have expected <identifier> on the following line (or even better: a b )

          Show
          blerer Benjamin Lerer added a comment - - edited I see a weird behavior when playing with auto-completion. cqlsh> create KEYSPACE test WITH replication = {'class': 'SimpleStrategy', 'replication_factor': 1}; cqlsh> use test; cqlsh:test> create type t (a int , b int ); cqlsh:test> create table test (pk int primary key, u t, v int ); cqlsh:test> update test set <TAB> u v cqlsh:test> update test set u <TAB> = . but if I press TAB after update test SET u. I get: cqlsh:test> update test SET u.<identifier> whereas I would have expected <identifier> on the following line (or even better: a b )
          Hide
          slebresne Sylvain Lebresne added a comment -

          Very minor nit (haven't looked at the patch) but can you add an entry to the CQL doc changelog and include a link to the ticket in that entry (I know we used to not do the latter but lets try to take the habit to as it was suggested to me it would be useful to some people and that's easy to do).

          Show
          slebresne Sylvain Lebresne added a comment - Very minor nit (haven't looked at the patch) but can you add an entry to the CQL doc changelog and include a link to the ticket in that entry (I know we used to not do the latter but lets try to take the habit to as it was suggested to me it would be useful to some people and that's easy to do).
          Hide
          thobbs Tyler Hobbs added a comment -

          Thanks for the thorough review!

          I've fixed all of your nits in a single commit. I also pushed two separate commits to update the CQL docs and cqlsh autocompletion.

          Show
          thobbs Tyler Hobbs added a comment - Thanks for the thorough review! I've fixed all of your nits in a single commit. I also pushed two separate commits to update the CQL docs and cqlsh autocompletion.
          Hide
          blerer Benjamin Lerer added a comment -

          The patch seems pretty good.

          The 2 missing things are the CQL documentation and the CQLSH auto completion for field update/deletion.

          Otherwise, I have found the following nits:

          • In AbstractMarker and FunctionCall the else are not needed as all the if end with return statements.
          • It seems to me that the signature of Operation.RawUpdate::prepare method could be improved by removing the keyspace name (it can be retrieved from cfm.ksName) and making cfm the first parameter.
          • In AbstractFunction and Selector I think that receiver.type.isFrozenCollection() || (receiver.type.isUDT() && !receiver.type.isMultiCell()) should be changed.
            In my opinion, we could add a isFreezable method to AbstractType (receiver.type.isFreezable() && !receiver.type.isMultiCell()) or use a isFrozenUDT method.
          • In the TulpeType constructor it seems that the code:
            for (int i = 0; i < types.size(); i++)
            {
                AbstractType<?> type = types.get(i);
                if (freezeInner)
                    type = type.freeze();
                types.set(i, type);
            }
            

            could be written:

            if (freezeInner)
                types.forEach(AbstractType::freeze);
            
          • In CollectionType::serializeForNativeProtocol the def parameter is not used and can be removed (it was there before the patch).
          • It might make sense to pass the protocol version to the UDType::serializeForNativeProtocol like for CollectionType::serializeForNativeProtocol even if it is not used for the moment.
          • I like a lot the beforeAndAfterFlush method used in UserTypesTest. Could you put in CQLTester instead? There is several places where I would like to use it.
          Show
          blerer Benjamin Lerer added a comment - The patch seems pretty good. The 2 missing things are the CQL documentation and the CQLSH auto completion for field update/deletion. Otherwise, I have found the following nits: In AbstractMarker and FunctionCall the else are not needed as all the if end with return statements. It seems to me that the signature of Operation.RawUpdate::prepare method could be improved by removing the keyspace name (it can be retrieved from cfm.ksName) and making cfm the first parameter. In AbstractFunction and Selector I think that receiver.type.isFrozenCollection() || (receiver.type.isUDT() && !receiver.type.isMultiCell()) should be changed. In my opinion, we could add a isFreezable method to AbstractType ( receiver.type.isFreezable() && !receiver.type.isMultiCell() ) or use a isFrozenUDT method. In the TulpeType constructor it seems that the code: for ( int i = 0; i < types.size(); i++) { AbstractType<?> type = types.get(i); if (freezeInner) type = type.freeze(); types.set(i, type); } could be written: if (freezeInner) types.forEach(AbstractType::freeze); In CollectionType::serializeForNativeProtocol the def parameter is not used and can be removed (it was there before the patch). It might make sense to pass the protocol version to the UDType::serializeForNativeProtocol like for CollectionType::serializeForNativeProtocol even if it is not used for the moment. I like a lot the beforeAndAfterFlush method used in UserTypesTest . Could you put in CQLTester instead? There is several places where I would like to use it.
          Hide
          thobbs Tyler Hobbs added a comment -

          That makes sense – I'm not sure why I didn't think of that yesterday. I've force-pushed an update to the branch that checks for non-frozen collections nested within non-frozen UDTs when creating a table.

          Show
          thobbs Tyler Hobbs added a comment - That makes sense – I'm not sure why I didn't think of that yesterday. I've force-pushed an update to the branch that checks for non-frozen collections nested within non-frozen UDTs when creating a table.
          Hide
          slebresne Sylvain Lebresne added a comment -

          we do not currently require collections inside UDT definitions to be declared with frozen<>. They are always implicitly frozen.

          It's not really that they are implicitly frozen, it's that we only allow frozne UDT and frozenness reaches deep. As soon as you froze something, everything nested is also frozen (rather intuitively I would add), and so I don't think there is anything wrong with the current behavior. But if this patch only allow non-frozen UDT at top-level, then I think we should force people to have nested fields frozen. In other words, we currently allow

          CREATE TYPE foo (c set<text>);
          CREATE TABLE bar (k int PRIMARY KEY, t frozen<foo>);
          

          and that's fine, but we don't and still shouldn't allow with this patch:

          CREATE TABLE bar (k int PRIMARY KEY, t foo);
          

          given the same definition of foo. What we should allow is:

          CREATE TYPE foo (c frozen<set<text>>);
          CREATE TABLE bar (k int PRIMARY KEY, t foo);
          

          Assuming we do that, which I strongly think we should, I don't see a backward compatibility problem supporting nesting non-frozen stuffs.

          Show
          slebresne Sylvain Lebresne added a comment - we do not currently require collections inside UDT definitions to be declared with frozen<> . They are always implicitly frozen. It's not really that they are implicitly frozen, it's that we only allow frozne UDT and frozenness reaches deep. As soon as you froze something, everything nested is also frozen (rather intuitively I would add), and so I don't think there is anything wrong with the current behavior. But if this patch only allow non-frozen UDT at top-level, then I think we should force people to have nested fields frozen. In other words, we currently allow CREATE TYPE foo (c set<text>); CREATE TABLE bar (k int PRIMARY KEY, t frozen<foo>); and that's fine, but we don't and still shouldn't allow with this patch: CREATE TABLE bar (k int PRIMARY KEY, t foo); given the same definition of foo . What we should allow is: CREATE TYPE foo (c frozen<set<text>>); CREATE TABLE bar (k int PRIMARY KEY, t foo); Assuming we do that, which I strongly think we should, I don't see a backward compatibility problem supporting nesting non-frozen stuffs.
          Hide
          thobbs Tyler Hobbs added a comment - - edited

          Initial patch and CI tests:

          branch testall dtest
          CASSANDRA-7423 testall dtest

          I've also opened a dtest pull request that fixes the failing user_functions_test.TestUserFunctions.udf_with_udt_test).

          This adds support for non-frozen UDTs, which support single-field updates and deletions. I've also added LWT support for UDTs (both on the value of individual fields and the entire UDT) to match what we support with collections. This patch does not add support for nesting non-frozen UDTs and collections – nested types must still be frozen.

          I would like to defer the optimization of selecting a single field from a non-frozen UDT to another ticket. (Currently, we fetch all fields from disk, then extract the selected fields.) We have roughly two months until the 3.6 release, and I have some higher priority work that I'd like to handle first.

          There's also one tricky related issue: we do not currently require collections inside UDT definitions to be declared with frozen<>. They are always implicitly frozen. If we ever want to support nesting non-frozen collections inside non-frozen UDTs (without introducing new syntax or breaking backwards compat), we need to deprecate and warn on the current behavior, and then require frozen<>. That can also be done in a separate ticket, but I wanted to raise the issue here.

          Show
          thobbs Tyler Hobbs added a comment - - edited Initial patch and CI tests: branch testall dtest CASSANDRA-7423 testall dtest I've also opened a dtest pull request that fixes the failing user_functions_test.TestUserFunctions.udf_with_udt_test ). This adds support for non-frozen UDTs, which support single-field updates and deletions. I've also added LWT support for UDTs (both on the value of individual fields and the entire UDT) to match what we support with collections. This patch does not add support for nesting non-frozen UDTs and collections – nested types must still be frozen. I would like to defer the optimization of selecting a single field from a non-frozen UDT to another ticket. (Currently, we fetch all fields from disk, then extract the selected fields.) We have roughly two months until the 3.6 release, and I have some higher priority work that I'd like to handle first. There's also one tricky related issue: we do not currently require collections inside UDT definitions to be declared with frozen<> . They are always implicitly frozen. If we ever want to support nesting non-frozen collections inside non-frozen UDTs (without introducing new syntax or breaking backwards compat), we need to deprecate and warn on the current behavior, and then require frozen<> . That can also be done in a separate ticket, but I wanted to raise the issue here.
          Hide
          thobbs Tyler Hobbs added a comment -

          Tagging this client-impacting because drivers might be unconditionally wrapping UDTs in frozen<>.

          Show
          thobbs Tyler Hobbs added a comment - Tagging this client-impacting because drivers might be unconditionally wrapping UDTs in frozen<> .
          Hide
          jbellis Jonathan Ellis added a comment -

          (Assigning to Tyler since he has 7826 already.)

          Show
          jbellis Jonathan Ellis added a comment - (Assigning to Tyler since he has 7826 already.)
          Hide
          snazy Robert Stupp added a comment -

          One question (just for understanding):
          Would this mean that each deeply nested field is persisted individually?
          For a user-type in a user-type it would lead to a column-name upperTypeCol:fieldA:deepTypeFieldA (with imaginary column "upperTypeCol" of type "upperType".
          But a set or list or map with user types or collections - wouldn't these require "blob serialization"?
          AFAIK are {{set}}s serialized with column names like "columnName:setElementValue".
          I think "blob serialization" must occur at a certain nesting/usage level - for example when a UDT is used as a map value - or when a UDT is used in a list (for convenience, when list elements are inserted/removed).

          Show
          snazy Robert Stupp added a comment - One question (just for understanding): Would this mean that each deeply nested field is persisted individually? For a user-type in a user-type it would lead to a column-name upperTypeCol:fieldA:deepTypeFieldA (with imaginary column "upperTypeCol" of type "upperType". But a set or list or map with user types or collections - wouldn't these require "blob serialization"? AFAIK are {{set}}s serialized with column names like "columnName:setElementValue". I think "blob serialization" must occur at a certain nesting/usage level - for example when a UDT is used as a map value - or when a UDT is used in a list (for convenience, when list elements are inserted/removed).
          Hide
          iamaleksey Aleksey Yeschenko added a comment -
          • there is more or less a consensus about UDTs as blobs in its current form being of very limited usefulness
          • making UDTs sub-field resolvable would most likely require the fields to be stored in separate cells
          • thus a rather painful migration will be required (like supercolumns, CASSANDRA-3237, bugs from which we are still occasionally finding)
          • a lot of other huge changes are planned for 3.0. Tackling them all at once, and UDT migrations on top, might end up being a hellish experience

          So I have a controversial suggesting here, especially with 2.1.0 being so close. Maybe (maybe) we should delay UDTs altogether until 3.0, where we'll make them properly sub-field resolvable. The native protocol part won't have to change - we can just remove CREATE/ALTER/DROP TYPE from Cql.g and the docs for 2.1.0, and remove the rest of the code (that needs removal) in 2.1.1+.

          Show
          iamaleksey Aleksey Yeschenko added a comment - there is more or less a consensus about UDTs as blobs in its current form being of very limited usefulness making UDTs sub-field resolvable would most likely require the fields to be stored in separate cells thus a rather painful migration will be required (like supercolumns, CASSANDRA-3237 , bugs from which we are still occasionally finding) a lot of other huge changes are planned for 3.0. Tackling them all at once, and UDT migrations on top, might end up being a hellish experience So I have a controversial suggesting here, especially with 2.1.0 being so close. Maybe (maybe) we should delay UDTs altogether until 3.0, where we'll make them properly sub-field resolvable. The native protocol part won't have to change - we can just remove CREATE/ALTER/DROP TYPE from Cql.g and the docs for 2.1.0, and remove the rest of the code (that needs removal) in 2.1.1+.
          Hide
          brianmhess Brian Hess added a comment -

          I think what we want from UDTs is that the individual elements of the UDT are atomic pieces of data, just like columns of a CQL table. They should be able to be retrieved on their own (which they are as implemented), as well as inserted/updated on their own (without having to get/modify/set the whole UDT). The latter one is not supported as implemented and is a real issue.

          One thing that comes out of them not being atomic pieces of data is that there is no safe way to update the UDT, but that is just one example.

          Show
          brianmhess Brian Hess added a comment - I think what we want from UDTs is that the individual elements of the UDT are atomic pieces of data, just like columns of a CQL table. They should be able to be retrieved on their own (which they are as implemented), as well as inserted/updated on their own (without having to get/modify/set the whole UDT). The latter one is not supported as implemented and is a real issue. One thing that comes out of them not being atomic pieces of data is that there is no safe way to update the UDT, but that is just one example.

            People

            • Assignee:
              thobbs Tyler Hobbs
              Reporter:
              tupshin Tupshin Harper
              Reviewer:
              Benjamin Lerer
              Tester:
              Philip Thompson
            • Votes:
              11 Vote for this issue
              Watchers:
              30 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development