[CASSANDRA-5959] CQL3 support for multi-column insert in a single operation (Batch Insert / Batch Mutate) - ASF JIRA

XML

Word

Printable

JSON

Details

Type: New Feature
Status: Resolved
Priority: Normal
Resolution: Duplicate
Fix Version/s: None
Component/s: None
Labels:
- CQL

Description

Impetus for this Request

(from the original question on StackOverflow):

I want to insert a single row with 50,000 columns into Cassandra 1.2.9. Before inserting, I have all the data for the entire row ready to go (in memory):

+---------+------+------+------+------+-------+
|         | 0    | 1    | 2    | ...  | 49999 |
| row_id  +------+------+------+------+-------+
|         | text | text | text | ...  | text  |
+---------+------+------+------|------+-------+

The column names are integers, allowing slicing for pagination. The column values are a value at that particular index.

CQL3 table definition:

create table results (
    row_id text,
    index int,
    value text,
    primary key (row_id, index)
) 
with compact storage;

As I already have the row_id and all 50,000 name/value pairs in memory, I just want to insert a single row into Cassandra in a single request/operation so it is as fast as possible.

The only thing I can seem to find is to do execute the following 50,000 times:

INSERT INTO results (row_id, index, value) values (my_row_id, ?, ?);

where the first ? is is an index counter (i) and the second ? is the text value to store at location i.

With the Datastax Java Driver client and C* server on the same development machine, this took a full minute to execute.

Oddly enough, the same 50,000 insert statements in a Datastax Java Driver Batch on the same machine took 7.5 minutes. I thought batches were supposed to be faster than individual inserts?

We tried instead with a Thrift client (Astyanax) and the same insert via a MutationBatch. This took 235 milliseconds.

Feature Request

As a result of this performance testing, this issue is to request that CQL3 support batch mutation operations as a single operation (statement) to ensure the same speed/performance benefits as existing Thrift clients.

Example suggested syntax (based on the above example table/column family):

insert into results (row_id, (index,value)) values 
    ((0,text0), (1,text1), (2,text2), ..., (N,textN));

Each value in the values clause is a tuple. The first tuple element is the column name, the second tuple element is the column value. This seems to be the most simple/accurate representation of what happens during a batch insert/mutate.

Not having this CQL feature forced us to remove the Datastax Java Driver (which we liked) in favor of Astyanax because Astyanax supports this behavior. We desire feature/performance parity between Thrift and CQL3/Datastax Java Driver, so we hope this request improves both CQL3 and the Driver.

Attachments

Issue Links

duplicates

CASSANDRA-4693 CQL Protocol should allow multiple PreparedStatements to be atomically executed

Resolved

is duplicated by

CASSANDRA-7654 CQL INSERT improvement

Resolved

Activity

People

Assignee:: Unassigned

Reporter:: Les Hazlewood

Votes:: 0 Vote for this issue

Watchers:: 6 Start watching this issue

Dates

Created:: 30/Aug/13 18:26

Updated:: 16/Apr/19 09:32

Resolved:: 30/Aug/13 20:02