Uploaded image for project: 'Cassandra'
  1. Cassandra
  2. CASSANDRA-15096

[RFC CQL v4+] cql_extension: wide range of unset values.



    • All
    • None



      The current implementation of unset_value regularly fails (see Issues).
      We need to implement a new unset_value(s) mechanism which is robust and will work well for v4+ protocols.


      1- A client has to encode unset_value for all the columns
      in an insert-prepared query values.

      example: INSERT INTO table(pkey,ckey,col1,col2,col3,col4) values(?,?,?,?,?,?);

      An execute query should unset all the columns one by one by encoding unset_value as "int(-2)"

      a- binded-values = (pkey_value, ckey_value, col1_value, unset_value, unset_value, unset_value) or
      b- binded-values = (pkey_value, ckey_value, unset_value, unset_value, unset_value, col4_value) etc.

      this increase the execute query binary buffer which is in term increase the bandwidth and latency for both request/response.

      2- Returning Select-queries buffer not differentiate between null and unset_value for a subset of given rows.

      imagine you have a dataset in the table where each row of the returning select response have different
      unset/null columns, consider the following query:
      SELECT * FROM table where pkey = pkey_value;
      and with a page_size = 3 rows ,


      pkey ckey col1 col2 col3 col4
      pkey_value ckey_value col1_value null/unset_value null/unset_value null/unset_value
      pkey_value ckey_value null/unset_value null/unset_value null/unset_value col4_value
      pkey_value ckey_value null/unset_value null/unset_value col3_value null/unset_value


      Proposed solution

      Instead of just having null(-1) and unset_value(-2), extending the unset_value(s)
      to a range from unset_(-2) to unset_(-2,147,483,648),
      where unset_value = unset_(-2)
      unset_rest = unset_(-2,147,483,648)
      anything in between will be unset_(neg_integer).

      Solution for issue_1:

      a- binded-values = (pkey_value, ckey_value, col1_value, unset_rest)
      b- binded-values = (pkey_value, ckey_value, unset_(-4), col4_value)

      Solution for issue_2:

      work with all select-un/prepared responses.

      row1 buffer -> pkey_value, ckey_value, col1_value, unset_rest.
      this will enable the buffer to shift to a new row.

      row2 buffer -> pkey_value, ckey_value, unset_(-4), col4_value.
      this will enable the buffer to skip the columns metadata -4+1=-3 columns and start decoding from col4 for the next cell_value in the row.

      row3 buffer -> pkey_value, ckey_value, unset_(-3), col3_value, unset_rest.
      this buffer is a mix of row1/row2.

      this solution not limited to unset_(neg-int) , it can be used on null cell responses to decrease the bandwidth between CQL and client.

      to be compatible with all the current v4+ cql/drivers, we should force the client to send a flag with the select query request (either in the frame-header or somewhere in the cql statement),
      and for returning buffer we could use the rows flags (ex, has_unset_values?: boolean) to let the driver know if it exist in the page.


      -implementing this will enable apps to design complex data-model up to 2 billion columns without trading off anything.

      -reducing the number of write-prepared statements in datamodel with millions of columns to a highest degree.

      -huge impact on the bandwidth/cpu-cycles.
      -easy to implement in the client side.

      Record of votes

      +1 Louay Kamel




            Unassigned Unassigned
            louay Louay Kamel
            0 Vote for this issue
            1 Start watching this issue