Uploaded image for project: 'Cassandra'
  1. Cassandra
  2. CASSANDRA-5959

CQL3 support for multi-column insert in a single operation (Batch Insert / Batch Mutate)

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Resolved
    • Normal
    • Resolution: Duplicate
    • None
    • None

    Description

      Impetus for this Request

      (from the original question on StackOverflow):

      I want to insert a single row with 50,000 columns into Cassandra 1.2.9. Before inserting, I have all the data for the entire row ready to go (in memory):

      +---------+------+------+------+------+-------+
      |         | 0    | 1    | 2    | ...  | 49999 |
      | row_id  +------+------+------+------+-------+
      |         | text | text | text | ...  | text  |
      +---------+------+------+------|------+-------+
      

      The column names are integers, allowing slicing for pagination. The column values are a value at that particular index.

      CQL3 table definition:

      create table results (
          row_id text,
          index int,
          value text,
          primary key (row_id, index)
      ) 
      with compact storage;
      

      As I already have the row_id and all 50,000 name/value pairs in memory, I just want to insert a single row into Cassandra in a single request/operation so it is as fast as possible.

      The only thing I can seem to find is to do execute the following 50,000 times:

      INSERT INTO results (row_id, index, value) values (my_row_id, ?, ?);
      

      where the first ? is is an index counter (i) and the second ? is the text value to store at location i.

      With the Datastax Java Driver client and C* server on the same development machine, this took a full minute to execute.

      Oddly enough, the same 50,000 insert statements in a Datastax Java Driver Batch on the same machine took 7.5 minutes. I thought batches were supposed to be faster than individual inserts?

      We tried instead with a Thrift client (Astyanax) and the same insert via a MutationBatch. This took 235 milliseconds.

      Feature Request

      As a result of this performance testing, this issue is to request that CQL3 support batch mutation operations as a single operation (statement) to ensure the same speed/performance benefits as existing Thrift clients.

      Example suggested syntax (based on the above example table/column family):

      insert into results (row_id, (index,value)) values 
          ((0,text0), (1,text1), (2,text2), ..., (N,textN));
      

      Each value in the values clause is a tuple. The first tuple element is the column name, the second tuple element is the column value. This seems to be the most simple/accurate representation of what happens during a batch insert/mutate.

      Not having this CQL feature forced us to remove the Datastax Java Driver (which we liked) in favor of Astyanax because Astyanax supports this behavior. We desire feature/performance parity between Thrift and CQL3/Datastax Java Driver, so we hope this request improves both CQL3 and the Driver.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              lhazlewood Les Hazlewood
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: