Cassandra
  1. Cassandra
  2. CASSANDRA-4693

CQL Protocol should allow multiple PreparedStatements to be atomically executed

    Details

    • Type: Improvement Improvement
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Fix Version/s: 2.0 beta 1
    • Component/s: Core
    • Labels:

      Description

      Currently the only way to insert multiple records on the same partition key, atomically and using PreparedStatements is to use a CQL BATCH command. Unfortunately when doing so the amount of records to be inserted must be known prior to prepare the statement which is rarely the case. Thus the only workaround if one want to keep atomicity is currently to use unprepared statements which send a bulk of CQL strings and is fairly inefficient.

      Therefore CQL Protocol should allow clients to send multiple PreparedStatements to be executed with similar guarantees and semantic as CQL BATCH command.

        Issue Links

          Activity

          Hide
          Sylvain Lebresne added a comment -

          One way to allow that would be to allow for the executePrepared message of the protocol to take a list of (preparedId, values) instead of just one (we would obviously refuse the query if the list size is > 1 and there is something else than a modification statement). Not sure we want to bother with on the CQL-over-thrift side however (I don't).

          Show
          Sylvain Lebresne added a comment - One way to allow that would be to allow for the executePrepared message of the protocol to take a list of (preparedId, values) instead of just one (we would obviously refuse the query if the list size is > 1 and there is something else than a modification statement). Not sure we want to bother with on the CQL-over-thrift side however (I don't).
          Hide
          Michaël Figuière added a comment -

          Considering the key role of Batches in CQL3, it would actually be very interesting to allow clients to send a list of both prepared and un-prepared statements to be executed with the same semantic and guarantees of a BATCH command. This would allow applications developers / frameworks to prepare most of their queries and include an additional one that is generated at runtime. An example of a use case that would leverage such is feature is when a bunch of columns need to be saved along with a set of complex collection mutations.

          Show
          Michaël Figuière added a comment - Considering the key role of Batches in CQL3, it would actually be very interesting to allow clients to send a list of both prepared and un-prepared statements to be executed with the same semantic and guarantees of a BATCH command. This would allow applications developers / frameworks to prepare most of their queries and include an additional one that is generated at runtime. An example of a use case that would leverage such is feature is when a bunch of columns need to be saved along with a set of complex collection mutations.
          Hide
          Rick Shaw added a comment -

          The JDBC spec has a well tested solution to this problem. The subject is covered in section 14.1.4 : "PreparedStatement Objects" under "Batch Updates". It is probably worth a look.

          The summary is that it makes a list of prepared statement entries and their associated parameters and keeps it under a controlling statement. A C* implementation might be to create a list of both the prepared statement token and its list of binding values. Keeping the list of bound values tightly coupled with each prepared statement token greatly simplifies the binding alignment when the number of operations is large.

          Show
          Rick Shaw added a comment - The JDBC spec has a well tested solution to this problem. The subject is covered in section 14.1.4 : "PreparedStatement Objects" under "Batch Updates". It is probably worth a look. The summary is that it makes a list of prepared statement entries and their associated parameters and keeps it under a controlling statement. A C* implementation might be to create a list of both the prepared statement token and its list of binding values. Keeping the list of bound values tightly coupled with each prepared statement token greatly simplifies the binding alignment when the number of operations is large.
          Hide
          Jonathan Ellis added a comment -

          We're only talking about updates here, right? Not returning multiple resultsets?

          Show
          Jonathan Ellis added a comment - We're only talking about updates here, right? Not returning multiple resultsets?
          Hide
          Michaël Figuière added a comment -

          Right. This feature is mostly about bringing atomic batch guarantees to set of PreparedStatements of any size. There will probably be some performance improvement in executing them in batch but it'll be small as the Binary Protocol is able to pipeline requests. So for read requests, not only returning multiple resultsets would be an unusual thing, it would also be unnecessary as there's no interesting guarantees to get there.

          Show
          Michaël Figuière added a comment - Right. This feature is mostly about bringing atomic batch guarantees to set of PreparedStatements of any size. There will probably be some performance improvement in executing them in batch but it'll be small as the Binary Protocol is able to pipeline requests. So for read requests, not only returning multiple resultsets would be an unusual thing, it would also be unnecessary as there's no interesting guarantees to get there.
          Hide
          Sylvain Lebresne added a comment -

          Attaching patch for this. This adds a new BATCH message to the protocol that allows pass a list of either string query (+ optional variables for one-shot binding) or prepared statement id + variables, and batch all of this server side.

          I made a small manual test and that seems to work correctly.

          Show
          Sylvain Lebresne added a comment - Attaching patch for this. This adds a new BATCH message to the protocol that allows pass a list of either string query (+ optional variables for one-shot binding) or prepared statement id + variables, and batch all of this server side. I made a small manual test and that seems to work correctly.
          Hide
          Aleksey Yeschenko added a comment -

          Could you rebase? It no longer applies because of trigger changes to BatchStatement.

          Show
          Aleksey Yeschenko added a comment - Could you rebase? It no longer applies because of trigger changes to BatchStatement.
          Hide
          Sylvain Lebresne added a comment -

          Rebased version attached.

          Show
          Sylvain Lebresne added a comment - Rebased version attached.
          Hide
          Aleksey Yeschenko added a comment -

          Can't get it to build - two execute() methods in BatchStatement with the same erasure

          Show
          Aleksey Yeschenko added a comment - Can't get it to build - two execute() methods in BatchStatement with the same erasure
          Hide
          Sylvain Lebresne added a comment -

          Interesting, didn't seem to be bothering my javac. But anyway, I've reattached the patch with one of the offending method renamed.

          Show
          Sylvain Lebresne added a comment - Interesting, didn't seem to be bothering my javac. But anyway, I've reattached the patch with one of the offending method renamed.
          Hide
          Aleksey Yeschenko added a comment -
          • 4.1.9 BATCH section number is wrong -> should be 4.1.7
          • QueryProcessor.processBatch() should call checkAccess() first, then validate() - to avoid leaking info on keyspace/tables existence to unauthenticated users
          • BatchMessage.toType() has an error: 2 should map to COUNTER, not to UNLOGGED

          Other than that LGTM

          Show
          Aleksey Yeschenko added a comment - 4.1.9 BATCH section number is wrong -> should be 4.1.7 QueryProcessor.processBatch() should call checkAccess() first, then validate() - to avoid leaking info on keyspace/tables existence to unauthenticated users BatchMessage.toType() has an error: 2 should map to COUNTER, not to UNLOGGED Other than that LGTM
          Hide
          Sylvain Lebresne added a comment -

          Committed with the point above fixed. Thanks!

          Show
          Sylvain Lebresne added a comment - Committed with the point above fixed. Thanks!

            People

            • Assignee:
              Sylvain Lebresne
              Reporter:
              Michaël Figuière
              Reviewer:
              Aleksey Yeschenko
            • Votes:
              1 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development