Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Fix Version/s: 3.2
    • Component/s: None
    • Labels:

      Description

      When selecting an avg() of int values, the type of the avg value returned is an int as well, meaning it's rounded off to an incorrect answer. This is both incorrect and inconsistent with other databases.

      Example:

      cqlsh:test> select * from monkey where id = 1;

      id | i | v
      ------
      1 | 1 | 1
      1 | 2 | 1
      1 | 3 | 2

      (3 rows)
      cqlsh:test> select avg(v) from monkey where id = 1;

      system.avg(v)
      ---------------
      1

      (1 rows)

      I tried avg() with MySQL, here's the result:

      mysql> create table blah ( id int primary key, v int );
      Query OK, 0 rows affected (0.15 sec)

      mysql> insert into blah set id = 1, v = 1;
      Query OK, 1 row affected (0.02 sec)

      mysql> insert into blah set id = 1, v = 1;
      ERROR 1062 (23000): Duplicate entry '1' for key 'PRIMARY'
      mysql> insert into blah set id = 2, v = 1;
      Query OK, 1 row affected (0.01 sec)

      mysql> insert into blah set id = 3, v = 2;
      Query OK, 1 row affected (0.01 sec)

      mysql> select avg(v) from blah;
      --------

      avg(v)

      --------

      1.3333

      --------
      1 row in set (0.00 sec)

      I created a new table using the above query. The result:

      mysql> create table foo as select avg(v) as a from blah;
      Query OK, 1 row affected, 1 warning (0.04 sec)
      Records: 1 Duplicates: 0 Warnings: 1

      mysql> desc foo;
      ---------------------------------------+

      Field Type Null Key Default Extra

      ---------------------------------------+

      a decimal(14,4) YES   NULL  

      ---------------------------------------+
      1 row in set (0.01 sec)

      It works the same way in postgres, and to my knowledge, every RDBMs.

      Broken in 2.2, 3.0.

      1. cassandra-2.2-10310.txt
        4 kB
        Brett Snyder
      2. cassandra-3.0-10310.txt
        4 kB
        Brett Snyder

        Activity

        Hide
        bsnyder788 Brett Snyder added a comment -

        Patch against 2.2.x branch. Will patch to 3.x too. Let me know if and where I should add new tests for this functionality. Thanks!

        Show
        bsnyder788 Brett Snyder added a comment - Patch against 2.2.x branch. Will patch to 3.x too. Let me know if and where I should add new tests for this functionality. Thanks!
        Hide
        blerer Benjamin Lerer added a comment -

        It works the same way in postgres, and to my knowledge, every RDBMs.

        Both Oracle and Microsoft SQL server return an int if the input type is an int.
        *Oracle documentation
        *Transact-SQL documentation

        If you want to read the full discussion about aggregates return types, you can found it here.

        Show
        blerer Benjamin Lerer added a comment - It works the same way in postgres, and to my knowledge, every RDBMs. Both Oracle and Microsoft SQL server return an int if the input type is an int . * Oracle documentation * Transact-SQL documentation If you want to read the full discussion about aggregates return types, you can found it here .
        Hide
        rustyrazorblade Jon Haddad added a comment -

        SQL server and Oracle allow you to cast a column and get whatever value you want. This decision is a little odd to me, since we're effectively saying "you choose an int for your column so you don't get precision", and don't offer a way around it. I can't think of a case where you'd prefer to have a less precise result from avg().

        Show
        rustyrazorblade Jon Haddad added a comment - SQL server and Oracle allow you to cast a column and get whatever value you want. This decision is a little odd to me, since we're effectively saying "you choose an int for your column so you don't get precision", and don't offer a way around it. I can't think of a case where you'd prefer to have a less precise result from avg().
        Hide
        blerer Benjamin Lerer added a comment -

        SQL server and Oracle allow you to cast a column and get whatever value you want.

        C* does the same. You can also cast your columns and get the results that you want. Type casting as been implemented in CASSANDRA-5226.

        Show
        blerer Benjamin Lerer added a comment - SQL server and Oracle allow you to cast a column and get whatever value you want. C* does the same. You can also cast your columns and get the results that you want. Type casting as been implemented in CASSANDRA-5226 .
        Hide
        slebresne Sylvain Lebresne added a comment -

        SQL server and Oracle allow you to cast a column and get whatever value you want.

        Yes, and that's what we should support imo (if for no other reasons than because that's a solution that doesn't break backward compatibility). In fact, that's what I suggested in my comment and to be honest I though we supported it, but appears that fell through the cracks.

        Show
        slebresne Sylvain Lebresne added a comment - SQL server and Oracle allow you to cast a column and get whatever value you want. Yes, and that's what we should support imo (if for no other reasons than because that's a solution that doesn't break backward compatibility). In fact, that's what I suggested in my comment and to be honest I though we supported it, but appears that fell through the cracks.
        Hide
        slebresne Sylvain Lebresne added a comment -

        You can also cast your columns and get the results that you want.

        No you can't. We definitively don't support type casing the selections, it's not even parsed.

        Show
        slebresne Sylvain Lebresne added a comment - You can also cast your columns and get the results that you want. No you can't. We definitively don't support type casing the selections, it's not even parsed.
        Hide
        slebresne Sylvain Lebresne added a comment -

        Ok, renamed the ticket so we fix that oversight.

        Show
        slebresne Sylvain Lebresne added a comment - Ok, renamed the ticket so we fix that oversight.
        Hide
        blerer Benjamin Lerer added a comment -

        My mistake, I misunderstood Sylvain Lebresne and Tyler Hobbs comments in CASSANDRA-5226.
        I thought that casting was already implemented.

        Show
        blerer Benjamin Lerer added a comment - My mistake, I misunderstood Sylvain Lebresne and Tyler Hobbs comments in CASSANDRA-5226 . I thought that casting was already implemented.
        Hide
        blerer Benjamin Lerer added a comment -

        I proposed to support the following casting:

        ascii -> ascii, text, varchar		
        bigint -> bigint, decimal, double, float, int, tinyint, smallint
        tinyint -> bigint, decimal, double, float, int, tinyint, smallint
        smallint -> bigint, decimal, double, float, int, tinyint, smallint	
        blob 	
        boolean -> ascii, boolean, text, varchar
        counter -> ascii, bigint, decimal, double, float, int, tinyint, smallint	
        date -> date, timestamp	
        decimal -> bigint, decimal, double, float, int, tinyint, smallint	
        double -> bigint, decimal, double, float, int, tinyint, smallint	
        float -> bigint, decimal, double, float, int, tinyint, smallint	
        inet -> ascii, text, varchar	
        int -> bigint, decimal, double, float, int, tinyint, smallint		
        text -> text, varchar		
        time 	
        timestamp -> date, timestamp
        timeuuid -> date, timestamp, timeuuid 	
        uuid	
        varchar -> text, varchar	
        varint -> bigint, decimal, double, float, int, tinyint, smallint
        

        Feedbacks are wellcome.

        Show
        blerer Benjamin Lerer added a comment - I proposed to support the following casting: ascii -> ascii, text, varchar bigint -> bigint, decimal, double , float , int , tinyint, smallint tinyint -> bigint, decimal, double , float , int , tinyint, smallint smallint -> bigint, decimal, double , float , int , tinyint, smallint blob boolean -> ascii, boolean , text, varchar counter -> ascii, bigint, decimal, double , float , int , tinyint, smallint date -> date, timestamp decimal -> bigint, decimal, double , float , int , tinyint, smallint double -> bigint, decimal, double , float , int , tinyint, smallint float -> bigint, decimal, double , float , int , tinyint, smallint inet -> ascii, text, varchar int -> bigint, decimal, double , float , int , tinyint, smallint text -> text, varchar time timestamp -> date, timestamp timeuuid -> date, timestamp, timeuuid uuid varchar -> text, varchar varint -> bigint, decimal, double , float , int , tinyint, smallint Feedbacks are wellcome.
        Hide
        JoshuaMcKenzie Joshua McKenzie added a comment -

        Are we looking to constrain to logical casts that we think make sense, or casts that are supported by the underlying data types in the storage engine? If the latter, date <=> int, time <=> long, timestamp <=> long, etc should all be in there too. Basically anything that satisfies the overridden isCompatibleWith method.

        My vote would be to support anything we can support and give users the flexibility of casting.

        Show
        JoshuaMcKenzie Joshua McKenzie added a comment - Are we looking to constrain to logical casts that we think make sense, or casts that are supported by the underlying data types in the storage engine? If the latter, date <=> int , time <=> long , timestamp <=> long , etc should all be in there too. Basically anything that satisfies the overridden isCompatibleWith method. My vote would be to support anything we can support and give users the flexibility of casting.
        Hide
        slebresne Sylvain Lebresne added a comment -

        Just as a fyi, I actually don't think we can use type casting for this, we might have to rely on conversion functions for that (though there is potentially other options). I don't really have time to go into the details of why today however, so I'll update with more details next week or something.

        With that said, I generally agree with the idea of allowing as much as is reasonable, but I also think that the exact details on what is reasonable is the least important part of this ticket, and we should actually probably start with something conservative, since adding new conversion/casting we've forgotten is easy, it's removing some that turns out to not be such a good idea that is hard.

        Show
        slebresne Sylvain Lebresne added a comment - Just as a fyi, I actually don't think we can use type casting for this, we might have to rely on conversion functions for that (though there is potentially other options). I don't really have time to go into the details of why today however, so I'll update with more details next week or something. With that said, I generally agree with the idea of allowing as much as is reasonable, but I also think that the exact details on what is reasonable is the least important part of this ticket, and we should actually probably start with something conservative, since adding new conversion/casting we've forgotten is easy, it's removing some that turns out to not be such a good idea that is hard.
        Hide
        JoshuaMcKenzie Joshua McKenzie added a comment -

        adding new conversion/casting we've forgotten is easy, it's removing some that turns out to not be such a good idea that is hard.

        Good point. +1 to starting conservatively.

        Show
        JoshuaMcKenzie Joshua McKenzie added a comment - adding new conversion/casting we've forgotten is easy, it's removing some that turns out to not be such a good idea that is hard. Good point. +1 to starting conservatively.
        Hide
        blerer Benjamin Lerer added a comment -

        The patch is here.
        For the syntax I choosed to use the syntax used by most of the relational databases CAST( x AS double).
        The implementation use internally native scalar functions to perform the conversions. An other option would have been to create a new Selector.
        The advantage of using the functions framework was that it was allowing to reuse existing conversion functions.

        The patch add support for the following casts:

        ascii -> text, varchar		
        bigint -> decimal, double, float, int, tinyint, smallint, varint
        tinyint -> decimal, bigint, double, float, int, smallint, varint
        smallint -> decimal, bigint, double, float, int, tinyint, varint
        boolean -> ascii, text, varchar
        counter -> decimal, bigint, double, float, int, tinyint, smallint, varint
        date -> timestamp	
        decimal -> bigint, double, float, int, tinyint, smallint, varint
        double -> decimal, bigint, float, int, tinyint, smallint, varint
        float -> decimal, bigint, double, int, tinyint, smallint, varint
        inet -> ascii, text, varchar	
        int -> decimal, bigint, double, float, tinyint, smallint, varinttime 	
        timestamp -> date
        timeuuid -> date, timestamp	
        varint -> decimal, bigint, double, float, int, tinyint, smallint
        

        C* will ignore any unnecessary cast (like casting an int to an int).

        CI:

        • The unit test results are here
        • The dtest results are here
        Show
        blerer Benjamin Lerer added a comment - The patch is here . For the syntax I choosed to use the syntax used by most of the relational databases CAST( x AS double) . The implementation use internally native scalar functions to perform the conversions. An other option would have been to create a new Selector . The advantage of using the functions framework was that it was allowing to reuse existing conversion functions. The patch add support for the following casts: ascii -> text, varchar bigint -> decimal, double , float , int , tinyint, smallint, varint tinyint -> decimal, bigint, double , float , int , smallint, varint smallint -> decimal, bigint, double , float , int , tinyint, varint boolean -> ascii, text, varchar counter -> decimal, bigint, double , float , int , tinyint, smallint, varint date -> timestamp decimal -> bigint, double , float , int , tinyint, smallint, varint double -> decimal, bigint, float , int , tinyint, smallint, varint float -> decimal, bigint, double , int , tinyint, smallint, varint inet -> ascii, text, varchar int -> decimal, bigint, double , float , tinyint, smallint, varinttime timestamp -> date timeuuid -> date, timestamp varint -> decimal, bigint, double , float , int , tinyint, smallint C* will ignore any unnecessary cast (like casting an int to an int). CI: The unit test results are here The dtest results are here
        Hide
        snazy Robert Stupp added a comment -

        Was curious how it looks and lurked into the patch.
        Overall it looks really nice. I'd just like to have numeric-to-string, (time)uuid-to-string, timestamp-to-time.
        The only thing that could be improved is the special casing in Selectable.newSelectorFactory which relies on string magic, which feels a bit fragile. Would it be possible for example to pass a flag or enum from Selectable.WithFunction.Raw to Selectable.WithFunction so that we can skip all the string magic?
        If we're going with the string magic, it would be nice to have a unit test using a preselected keyspace (USE some_keyspace).

        Show
        snazy Robert Stupp added a comment - Was curious how it looks and lurked into the patch. Overall it looks really nice. I'd just like to have numeric-to-string, (time)uuid-to-string, timestamp-to-time. The only thing that could be improved is the special casing in Selectable.newSelectorFactory which relies on string magic, which feels a bit fragile. Would it be possible for example to pass a flag or enum from Selectable.WithFunction.Raw to Selectable.WithFunction so that we can skip all the string magic? If we're going with the string magic, it would be nice to have a unit test using a preselected keyspace ( USE some_keyspace ).
        Hide
        blerer Benjamin Lerer added a comment -

        I am also not fully happy with this string magic. I will try to come up with a better approach.

        Show
        blerer Benjamin Lerer added a comment - I am also not fully happy with this string magic. I will try to come up with a better approach.
        Hide
        blerer Benjamin Lerer added a comment - - edited

        Thanks for the feedback Robert Stupp.

        I force pushed the new patch.
        It removes all the String magic by using a specific Selector for CAST but it still rely on the function framework. By consequence if the functions are exposed through the system tables the supported casts will also be.

        I have added conversions to ascii and text types.

        Show
        blerer Benjamin Lerer added a comment - - edited Thanks for the feedback Robert Stupp . I force pushed the new patch. It removes all the String magic by using a specific Selector for CAST but it still rely on the function framework. By consequence if the functions are exposed through the system tables the supported casts will also be. I have added conversions to ascii and text types.
        Hide
        snazy Robert Stupp added a comment -

        Overall the patch looks really good!

        Just a couple of really minor/trivial things (except the first one). I'd be fine with fixing them on commit.

        • CastFcts and CastFcsTest are missing the copyright header
        • StrBuilder in WithCast.toString can be replaced with a simple string concatenation
        • Selectable line 314 needs reformat
        • Should CastFcsTest read CastFctsTest ?
        • CastFcts.all can use method references for the to-byte,short,int,long,float,double conversions in the for-loop (e.g. Number::byteValue). It feels easier for the JVM to use method refs instead of lambdas - but I’m unsure whether it really is.
        • CQL.textile might need a sentence that we strictly rely on Java’s semantics for the conversions. E.g. the double for 1 will be converted to the string value ”1.0” and not just to ”1” (as Python for example does).
        • maybe also add the following snippet to testCastsWithReverseOrder method (or any other method). Just wanted to see whether nested casts and casts with UDFs work.
                  assertRows(execute("SELECT CAST(CAST(a AS tinyint) AS smallint), " +
                                     "CAST(CAST(b AS tinyint) AS smallint), " +
                                     "CAST(CAST(c AS tinyint) AS smallint) FROM %s"),
                             row((short) 1, (short) 2, (short) 6));
          
                  assertRows(execute("SELECT CAST(CAST(CAST(a AS tinyint) AS double) AS text), " +
                                     "CAST(CAST(CAST(b AS tinyint) AS double) AS text), " +
                                     "CAST(CAST(CAST(c AS tinyint) AS double) AS text) FROM %s"),
                             row("1.0", "2.0", "6.0"));
          
                  String f = createFunction(KEYSPACE, "int",
                                            "CREATE FUNCTION %s(val int) " +
                                            "RETURNS NULL ON NULL INPUT " +
                                            "RETURNS double " +
                                            "LANGUAGE java " +
                                            "AS 'return (double)val;'");
          
                  assertRows(execute("SELECT " + f + "(CAST(b AS int)) FROM %s"),
                             row((double) 2));
          
                  assertRows(execute("SELECT CAST(" + f + "(CAST(b AS int)) AS text) FROM %s"),
                             row("2.0"));
          
        Show
        snazy Robert Stupp added a comment - Overall the patch looks really good! Just a couple of really minor/trivial things (except the first one). I'd be fine with fixing them on commit. CastFcts and CastFcsTest are missing the copyright header StrBuilder in WithCast.toString can be replaced with a simple string concatenation Selectable line 314 needs reformat Should CastFcsTest read CastFctsTest ? CastFcts.all can use method references for the to-byte,short,int,long,float,double conversions in the for-loop (e.g. Number::byteValue ). It feels easier for the JVM to use method refs instead of lambdas - but I’m unsure whether it really is. CQL.textile might need a sentence that we strictly rely on Java’s semantics for the conversions. E.g. the double for 1 will be converted to the string value ”1.0” and not just to ”1” (as Python for example does). maybe also add the following snippet to testCastsWithReverseOrder method (or any other method). Just wanted to see whether nested casts and casts with UDFs work. assertRows(execute( "SELECT CAST(CAST(a AS tinyint) AS smallint), " + "CAST(CAST(b AS tinyint) AS smallint), " + "CAST(CAST(c AS tinyint) AS smallint) FROM %s" ), row(( short ) 1, ( short ) 2, ( short ) 6)); assertRows(execute( "SELECT CAST(CAST(CAST(a AS tinyint) AS double ) AS text), " + "CAST(CAST(CAST(b AS tinyint) AS double ) AS text), " + "CAST(CAST(CAST(c AS tinyint) AS double ) AS text) FROM %s" ), row( "1.0" , "2.0" , "6.0" )); String f = createFunction(KEYSPACE, " int " , "CREATE FUNCTION %s(val int ) " + "RETURNS NULL ON NULL INPUT " + "RETURNS double " + "LANGUAGE java " + "AS ' return ( double )val;'" ); assertRows(execute( "SELECT " + f + "(CAST(b AS int )) FROM %s" ), row(( double ) 2)); assertRows(execute( "SELECT CAST(" + f + "(CAST(b AS int )) AS text) FROM %s" ), row( "2.0" ));
        Hide
        snazy Robert Stupp added a comment -

        So, yea - LGTM. Just fix the copyright thing and those minor/trivial things you feel comfortable with. CI results LGTM, too ; at least nothing related to your patch.
        (I guess, fix version is 3.2).

        Show
        snazy Robert Stupp added a comment - So, yea - LGTM. Just fix the copyright thing and those minor/trivial things you feel comfortable with. CI results LGTM, too ; at least nothing related to your patch. (I guess, fix version is 3.2).
        Hide
        blerer Benjamin Lerer added a comment -

        In fact the CastFcts unit test is failling.
        It seems to be some Timezone issue. I will have to investigate what is wrong and have another CI run.

        Show
        blerer Benjamin Lerer added a comment - In fact the CastFcts unit test is failling . It seems to be some Timezone issue. I will have to investigate what is wrong and have another CI run.
        Hide
        snazy Robert Stupp added a comment - - edited

        Huh? That's strange! I'm sure your utests worked fine on my machine and they are fine now, too.

        EDIT:
        Oh! Line 274 in CastFcts should be return outputType().decompose(builder.toString()); (and the last assertion in testTimeCastsInSelectionClause needs to be adjusted, too).
        And since I'm in the same TZ as you, the tests didn't complain on my machine.

        Show
        snazy Robert Stupp added a comment - - edited Huh? That's strange! I'm sure your utests worked fine on my machine and they are fine now, too. EDIT: Oh! Line 274 in CastFcts should be return outputType().decompose(builder.toString()); (and the last assertion in testTimeCastsInSelectionClause needs to be adjusted, too). And since I'm in the same TZ as you, the tests didn't complain on my machine.
        Hide
        snazy Robert Stupp added a comment -

        Just a minor typo in CQL.textile (should read Java's semantics - the apostrophe is some crazy unicode char) - you can fix that on commit.
        +1 (pending CI results)

        Show
        snazy Robert Stupp added a comment - Just a minor typo in CQL.textile (should read Java's semantics - the apostrophe is some crazy unicode char) - you can fix that on commit. +1 (pending CI results)
        Hide
        blerer Benjamin Lerer added a comment -

        @snazy Thanks for the review.
        Unit tests were fine. DTests were suffering from timeout issues but it does not look related to the changes.

        Show
        blerer Benjamin Lerer added a comment - @snazy Thanks for the review. Unit tests were fine. DTests were suffering from timeout issues but it does not look related to the changes.
        Hide
        blerer Benjamin Lerer added a comment -

        Committed in 3.2 at 269c5d4f85d6e4edd1a100533c9adb88a36ade70

        Show
        blerer Benjamin Lerer added a comment - Committed in 3.2 at 269c5d4f85d6e4edd1a100533c9adb88a36ade70

          People

          • Assignee:
            blerer Benjamin Lerer
            Reporter:
            rustyrazorblade Jon Haddad
            Reviewer:
            Robert Stupp
            Tester:
            Jon Haddad
          • Votes:
            1 Vote for this issue
            Watchers:
            9 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development