Uploaded image for project: 'Apache Cassandra'
  1. Apache Cassandra
  2. CASSANDRA-17719

CEP-15: Multi-Partition Transaction CQL Support (Alpha)

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Resolved
    • Normal
    • Resolution: Fixed
    • NA
    • Accord, CQL/Syntax
    • None

    Description

      The dev list thread regarding CQL transaction support seems to have converged on a syntax.
       
      The thread is here: https://lists.apache.org/thread/5sds3968mnnk42c24pvgwphg4qvo2xk0
       
      The message proposing the syntax ultimately agreed on is here: https://lists.apache.org/thread/y289tczngj68bqpoo7gkso3bzmtf86pl
       
      I'll describe my understanding of  the agreed syntax here for, but I'd suggest reading through the thread.
       
      The example query is this:

      BEGIN TRANSACTION
        LET car = (SELECT miles_driven, is_running FROM cars WHERE model=’pinto’);
        LET user = (SELECT miles_driven FROM users WHERE name=’Blake’);
        SELECT car.is_running, car.miles_driven;
        IF car.is_running THEN
          UPDATE users SET miles_driven = user.miles_driven + 30 WHERE name='blake';
          UPDATE cars SET miles_driven = car.miles_driven + 30 WHERE model='pinto';
        END IF
      COMMIT TRANSACTION
      

      Sections are described below, and we want to require the statement enforces an order on the different types of clauses. First, assignments, then select(s), then conditional updates. This may be relaxed in the future, but is meant to prevent users from interleaving updates and assignments and expecting read your own write behavior that we don't want to support in v1. 

      Reference assignments

        LET car = (SELECT miles_driven, is_running FROM cars WHERE model=’pinto’)

       
      The first part is basically assigning the result of a SELECT statement to a named reference that can be used in updates/conditions and be returned to the user. Tuple names belong to a global scope and must not clash with other LET statements or update statements (more on that in the updates section). Also, the select statement must either be a point read, with all primary key columns defined, or explicitly set a limit of 1. 

      Selection

      Data to returned to client. Currently, we're only supporting a single select statement. Either a normal select statement, or one returning values assigned by LET statements as shown in the example. Ultimately we'll want to support multiple select statements and returning the results to the client. Although that will require a protocol change.

      Updates

      Normal inserts/updates/deletes with the following extensions:

      • Inserts and updates can reference values assigned earlier in the statement
      • Updates can reference their own columns:
        miles_driven = miles_driven + 30

         - or -

        miles_driven += 30

        These will of course need to generate any required reads behind the scenes. There's no precedence of table vs reference names, so if a relevant column name and reference name conflict, the query needs to fail to parse.

      If blocks 

        IF <some conditions> THEN
          <some update>;
          <another update>;
        END IF
      

       
      For v1, we only need to support a single condition in the format above. In the future, we'd like to support multiple branches with multiple conditions like:
       

        IF <some conditions> THEN
          <some update>;
          <another update>;
        ELSE IF <another condition> THEN
          <yet another update>;
        ELSE
          <an update>;
        END IF
      

       

      Conditions

      Comparisons of value references to literals or other references. IS NULL / IS NOT NULL also needs to be supported. Multiple comparisons need to be supported, but v1 only needs to support AND'ing them together.

      Supported operators: =, !=, >, >=, <, <=, IS NULL, IS NOT NULL
      <value_reference> = 5
      <value_reference> IS NOT NULL
      
      IF car IS NOT NULL AND car.miles_driven > 30
      IF car.miles_driven = user.miles_driven

      (Note that IS[ NOT ]NULL can apply to both tuple references and individually dereferenced columns.)
      CONTAINS and CONTAINS KEY require indexed collections, and will not be supported in v1.

      Implementation notes

      I have a proof of concept I'd created to demo the initially proposed syntax here: https://github.com/bdeggleston/cassandra/tree/accord-cql-poc,  It could be used as a starting point, a source of ideas for approaches, or not used at all. The main thing to keep in mind is that value references prevent creating read commands and mutations until later in the transaction process, and potentially on another machine, which means we can't create accord transactions with fully formed mutations and read commands. CQL statements and terms are also not serializable, and would require a ton of work to properly serialize, so there will need to be some intermediate stage that can be serialized.

      Attachments

        Issue Links

          Activity

            People

              maedhroz Caleb Rackliffe
              bdeggleston Blake Eggleston
              Blake Eggleston, Caleb Rackliffe, David Capwell
              Ariel Weisberg, David Capwell
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 12.5h
                  12.5h