Uploaded image for project: 'Apache Cassandra'
  1. Apache Cassandra
  2. CASSANDRA-19874

Function inference maybe unable to infer the correct function or chooses one for a smaller type

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Normal
    • Resolution: Unresolved
    • 5.x
    • CQL/Interpreter
    • None
    • Correctness - Unrecoverable Corruption / Loss
    • Critical
    • Normal
    • Fuzz Test
    • All
    • None

    Description

      Here are 2 numeric examples where function inference doesn’t do the right thing

      1) varint

      Case 1: Wrong Bytes

      BigInteger pk = BigInteger.valueOf(-42 + -42);
      BigInteger ck = BigInteger.valueOf(42 + 42);
      BigInteger v = BigInteger.valueOf(-8200 + -16990);
      createTable("CREATE TABLE %s(pk varint, ck varint, v varint, PRIMARY KEY(pk, ck))");
      execute("INSERT INTO %s(pk, ck, v) VALUES (-42 + -42, 42 + 42, -8200 + -16990)");
      assertRows(execute("SELECT * FROM %s"),
                 row(pk, ck, v));
      
      assertRows(execute("SELECT * FROM %s WHERE pk=?", pk),
                 row(pk, ck, v));
      
      assertRows(execute("SELECT * FROM %s WHERE pk=? AND ck=?", pk, ck),
                 row(pk, ck, v));
      

      This fails with

      java.lang.AssertionError: Got less rows than expected. Expected 1 but got 0
      

      The reason is that the "system._add: (int, int) -> int” function works for ints rather than varint, which means the bytes created do not match the bytes that a varint would have created!

      In the example of -42 + -42 we get -84 which is a single byte for varint and 4 bytes for int. Since the bytes don’t match partition/clustering equality does not match!

      Case 2: Overflow

      BigInteger pk = BigInteger.valueOf(59463118).multiply(BigInteger.valueOf(-2171));
      BigInteger ck = pk;
      BigInteger v = pk;
      createTable("CREATE TABLE %s(pk varint, ck varint, v varint, PRIMARY KEY(pk, ck))");
      execute("INSERT INTO %s(pk, ck, v) VALUES (59463118 * -2171, 59463118 * -2171, 59463118 * -2171)");
      assertRows(execute("SELECT * FROM %s"),
                 row(pk, ck, v));
      

      This fails with

      java.lang.AssertionError: Invalid value for row 0 column 0 (pk of type varint), expected <-129094429178> but got <-245410298>
      Invalid value for row 0 column 1 (ck of type varint), expected <-129094429178> but got <-245410298>
      Invalid value for row 0 column 2 (v of type varint), expected <-129094429178> but got <-245410298>
      

      The reason for this is the same as above, we selected “system._multiply: (int, int) -> int”, and if you do 59463118 * -2171 as an int it overflows and produces -245410298. If you instead upgrade the 2 ints to a BigInteger and then do the multiple you get -129094429178!

      This isn’t a problem for other databases, here is an example with sqlite

      sqlite> create table foo2(pk varint);
      sqlite> insert into foo2(pk) values (59463118 * -2171);
      sqlite> select * from foo2;
         pk = -129094429178
      

      2) smallint

      createTable("CREATE TABLE %s(pk smallint, ck smallint, v smallint, PRIMARY KEY(pk, ck)) WITH CLUSTERING ORDER BY (ck DESC)");
      execute("INSERT INTO %s(pk, ck, v) VALUES (-42 + -42, 42 + 42, -42 + 42)");
      

      Here the function selection fails as it can’t find a match

      org.apache.cassandra.exceptions.InvalidRequestException: Ambiguous '+' operation with args 42 and 42: use type hint to disambiguate, example '(int) ?'
      
      	at org.apache.cassandra.cql3.statements.RequestValidations.invalidRequest(RequestValidations.java:370)
      	at org.apache.cassandra.cql3.functions.FunctionResolver.pickBestMatch(FunctionResolver.java:179)
      	at org.apache.cassandra.cql3.functions.FunctionResolver.get(FunctionResolver.java:87)
      	at org.apache.cassandra.cql3.functions.FunctionCall$Raw.prepare(FunctionCall.java:154)
      

      The reason here is similar to the above… we think that the numbers “-42”, and “42” are “int” type rather than “smallint” and we don’t find a function to pick up!

      If instead you just did

       
      INSERT INTO %s(pk, ck, v) VALUES(-42, 42, 42)
      

      We would then be able to infer that these are “smallint” rather than upcasting them to “int”.

      This isn’t a problem for other databases, here is an example from sqlite

      sqlite> create table foo(pk smallint);
      sqlite> insert into foo(pk) values (-42 + -42);
      sqlite> select * from foo;
         pk = -84
      

      Attachments

        Activity

          People

            Unassigned Unassigned
            dcapwell David Capwell
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: