Cassandra
  1. Cassandra
  2. CASSANDRA-4210

Support for variadic parameters list for "in clause" in prepared cql query

    Details

    • Type: Improvement Improvement
    • Status: Resolved
    • Priority: Minor Minor
    • Resolution: Fixed
    • Fix Version/s: 2.0.1
    • Component/s: Core
    • Labels:
      None
    • Environment:

      prepared cql queries

      Description

      This query

      select * from Town where key in (?)
      

      only allows one parameter for '?'.

      This means querying for 'Paris' and 'London' can't be executed in one step with this prepared statement.

      Current workarounds are:

      • either execute the prepared query 2 times with 'Paris' then 'London'
      • or prepare a new query select * from Town where key in (?, ?) and bind the 2 parameters

      Having a support for variadic parameters list with in clause could improve performance:

      • single hop to get the data
      • // fetching server side
      1. 4210.txt
        47 kB
        Sylvain Lebresne

        Issue Links

          Activity

          Hide
          Pierre Chalamet added a comment -

          This is still a problem when trying to bind an IN parameter for prepared statement even in 1.2.4. For what I've seen, the column spec returned after preparing

          select * from Town where key in (?)
          

          just tells that the parameter is of type 'key', not a set of type 'key'.

          This would be really nice for binary protocol driver to know they could bind a set of value for such parameter (and I'm pretty sure this info is known when the statement is prepared).

          Show
          Pierre Chalamet added a comment - This is still a problem when trying to bind an IN parameter for prepared statement even in 1.2.4. For what I've seen, the column spec returned after preparing select * from Town where key in (?) just tells that the parameter is of type 'key', not a set of type 'key'. This would be really nice for binary protocol driver to know they could bind a set of value for such parameter (and I'm pretty sure this info is known when the statement is prepared).
          Hide
          Sylvain Lebresne added a comment -

          just tells that the parameter is of type 'key', not a set of type 'key'.

          Yes, because that's like if you were writing

          select * from Town where key in (2)
          

          but with a marker.

          Which means that if we do support that, it'll probably have to use the syntax:

          select * from Town where key in ?
          

          for compatibility reasons.

          But I'd like to note that "multiget" queries are not necessary more efficient. It's not because the server don't don't anything smart about it, except parallelizing each individual queries (but that's really all it does). For that reason, doing the parallelization client side provides the benefit that you can start processing answers as they come, instead for waiting for the full result set. So even though you do "waste" a little more network bandwith between the client and server, I suspect that in a lot of use cases, you may actually get better throughput by parallelizing server side.

          Anyway, that digression aside, I have no problem adding variadic IN (up to the minor syntax detail above) if only because when IN is used for the clustering key (versus the partition one), then it does always improve performance.

          Show
          Sylvain Lebresne added a comment - just tells that the parameter is of type 'key', not a set of type 'key'. Yes, because that's like if you were writing select * from Town where key in (2) but with a marker. Which means that if we do support that, it'll probably have to use the syntax: select * from Town where key in ? for compatibility reasons. But I'd like to note that "multiget" queries are not necessary more efficient. It's not because the server don't don't anything smart about it, except parallelizing each individual queries (but that's really all it does). For that reason, doing the parallelization client side provides the benefit that you can start processing answers as they come, instead for waiting for the full result set. So even though you do "waste" a little more network bandwith between the client and server, I suspect that in a lot of use cases, you may actually get better throughput by parallelizing server side. Anyway, that digression aside, I have no problem adding variadic IN (up to the minor syntax detail above) if only because when IN is used for the clustering key (versus the partition one), then it does always improve performance.
          Hide
          Sylvain Lebresne added a comment -

          Attaching patch for this.

          Show
          Sylvain Lebresne added a comment - Attaching patch for this.
          Hide
          Aleksey Yeschenko added a comment -

          The code LGTM. Confirmed to work for UPDATE/SELECT, and I don't see any obvious regressions (neither do dtests). +1

          Show
          Aleksey Yeschenko added a comment - The code LGTM. Confirmed to work for UPDATE/SELECT, and I don't see any obvious regressions (neither do dtests). +1
          Hide
          Sylvain Lebresne added a comment -

          Committed, thanks

          Show
          Sylvain Lebresne added a comment - Committed, thanks
          Hide
          Alexandre Gaudencio added a comment -

          I would like to add that using variadic params for IN has different effects depending on the prepare option.

          I am using the nodejs driver.

          With this query :

          var query = 'SELECT * FROM event WHERE id IN (?)'
          var params = [ '139a42b0-6bef-11e4-b6d6-0582c8632278,02203b20-6bef-11e4-a5a2-95978cb98ca6']
          
          cql.execute(query, params, function (err, result) {
          	if (err) {
          		console.log(err)
          	} else {
          		console.log(result);
          	}
          })
          

          I get the result :

          { [ResponseError: UUID should be 16 or 0 bytes (369)]
            name: 'ResponseError',
            message: 'UUID should be 16 or 0 bytes (369)',
            info: 'Represents an error message from the server',
            code: 8704,
            query: 'SELECT * FROM event WHERE id IN (?)' }
          

          Whereas with a prepared statement, it goes like this :

          var query = 'SELECT * FROM event WHERE id IN (?)'
          var params = [ '139a42b0-6bef-11e4-b6d6-0582c8632278,02203b20-6bef-11e4-a5a2-95978cb98ca6']
          
          cql.execute(query, params, { prepare : true } , function (err, result) {
          	if (err) {
          		console.log(err)
          	} else {
          		console.log(result);
          	}
          })
          
          { rows:
             [ { __columns: [Object],
                 id: '139a42b0-6bef-11e4-b6d6-0582c8632278',
                 name: someName,
                 type: 'someType',
                 value: '{"id":"someId"}' } ],
            meta:
             { global_tables_spec: true,
               keyspace: 'someKeyspace',
               table: 'event',
               columns:
                [ [Object],
                  [Object],
                  [Object],
                  [Object],
                  _col_id: 0,
                  _col_name: 1,
                  _col_type: 2,
                  _col_value: 3 ] },
            _queriedHost: 'xxx.xxx.xxx.xxx' }
          

          The behavior is inconsistent.
          In the first case I don't any result but an error instead stating that (as I understand it) the parser only excepted one UUID parameter and the received arg is too long.
          In the second case, the query is ran without a problem except for the fact that only the first UUID is considered, the rest being ditched. And no error has been emitted. (This query should have returned two rows)

          Show
          Alexandre Gaudencio added a comment - I would like to add that using variadic params for IN has different effects depending on the prepare option. I am using the nodejs driver. With this query : var query = 'SELECT * FROM event WHERE id IN (?)' var params = [ '139a42b0-6bef-11e4-b6d6-0582c8632278,02203b20-6bef-11e4-a5a2-95978cb98ca6'] cql.execute(query, params, function (err, result) { if (err) { console.log(err) } else { console.log(result); } }) I get the result : { [ResponseError: UUID should be 16 or 0 bytes (369)] name: 'ResponseError', message: 'UUID should be 16 or 0 bytes (369)', info: 'Represents an error message from the server', code: 8704, query: 'SELECT * FROM event WHERE id IN (?)' } Whereas with a prepared statement, it goes like this : var query = 'SELECT * FROM event WHERE id IN (?)' var params = [ '139a42b0-6bef-11e4-b6d6-0582c8632278,02203b20-6bef-11e4-a5a2-95978cb98ca6'] cql.execute(query, params, { prepare : true } , function (err, result) { if (err) { console.log(err) } else { console.log(result); } }) { rows: [ { __columns: [Object], id: '139a42b0-6bef-11e4-b6d6-0582c8632278', name: someName, type: 'someType', value: '{"id":"someId"}' } ], meta: { global_tables_spec: true, keyspace: 'someKeyspace', table: 'event', columns: [ [Object], [Object], [Object], [Object], _col_id: 0, _col_name: 1, _col_type: 2, _col_value: 3 ] }, _queriedHost: 'xxx.xxx.xxx.xxx' } The behavior is inconsistent. In the first case I don't any result but an error instead stating that (as I understand it) the parser only excepted one UUID parameter and the received arg is too long. In the second case, the query is ran without a problem except for the fact that only the first UUID is considered, the rest being ditched. And no error has been emitted. (This query should have returned two rows)
          Hide
          Sylvain Lebresne added a comment -

          I strongly suspect that the first error is actually due to the nodejs driver. Most probably, the nodejs driver doesn't have enough type information in the non-prepared case to properly serialize your uuid as a uuid. Instead, it probably serialize it as a string which is incorrect. You should report this to the nodejs driver authors to figure this out, but the error pretty clearly indicate that something wrong has been sent to the server and that's almost surely not a server bug.

          Regarding the 2nd case, you do not provided any information to assert that the query should indeed have returned two rows. Maybe you're 2nd UUID actually doesn't exist in the DB and the answer is correct. Or, here again, maybe this is a nodejs driver bug. Because we have a fair amount tests for IN queries so the likeliness of a server not returning the proper data is not extremely high a priori. Overall I suggest that 1) you double-check that your query should indeed return 2 rows, 2) if it should indeed that you first report it to the nodejs driver to see if it's not a nodejs driver bug and 3) if it does turn out that it's likely a server bug, that you open a new separate ticket with a bit more information on your case (like at least the Cassandra version in use) and full reproduction steps if possible.

          Show
          Sylvain Lebresne added a comment - I strongly suspect that the first error is actually due to the nodejs driver. Most probably, the nodejs driver doesn't have enough type information in the non-prepared case to properly serialize your uuid as a uuid. Instead, it probably serialize it as a string which is incorrect. You should report this to the nodejs driver authors to figure this out, but the error pretty clearly indicate that something wrong has been sent to the server and that's almost surely not a server bug. Regarding the 2nd case, you do not provided any information to assert that the query should indeed have returned two rows. Maybe you're 2nd UUID actually doesn't exist in the DB and the answer is correct. Or, here again, maybe this is a nodejs driver bug. Because we have a fair amount tests for IN queries so the likeliness of a server not returning the proper data is not extremely high a priori. Overall I suggest that 1) you double-check that your query should indeed return 2 rows, 2) if it should indeed that you first report it to the nodejs driver to see if it's not a nodejs driver bug and 3) if it does turn out that it's likely a server bug, that you open a new separate ticket with a bit more information on your case (like at least the Cassandra version in use) and full reproduction steps if possible.
          Hide
          Alexandre Gaudencio added a comment - - edited

          Thanks. I'm quite convinced that this is related to the nodejs driver so I'll just report this on github.

          [edit] Sorry for the edit but there's no place on the github repo to open issues for the nodejs driver. Do you know what the best place is to reach the team in charge of this driver ?

          Show
          Alexandre Gaudencio added a comment - - edited Thanks. I'm quite convinced that this is related to the nodejs driver so I'll just report this on github. [edit] Sorry for the edit but there's no place on the github repo to open issues for the nodejs driver. Do you know what the best place is to reach the team in charge of this driver ?
          Hide
          Tyler Hobbs added a comment -

          Do you know what the best place is to reach the team in charge of this driver ?

          The JIRA for the nodejs driver can be found here: https://datastax-oss.atlassian.net/browse/NODEJS

          Show
          Tyler Hobbs added a comment - Do you know what the best place is to reach the team in charge of this driver ? The JIRA for the nodejs driver can be found here: https://datastax-oss.atlassian.net/browse/NODEJS

            People

            • Assignee:
              Sylvain Lebresne
              Reporter:
              Pierre Chalamet
              Reviewer:
              Aleksey Yeschenko
            • Votes:
              5 Vote for this issue
              Watchers:
              12 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development