Uploaded image for project: 'Calcite'
  1. Calcite
  2. CALCITE-845

Derive SUM’s return type by a customizable policy

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.5.0
    • Component/s: None
    • Labels:

      Description

      The purpose is to return correct sum result when the number goes beyond its type limitation. Currently Calcite behaves "type sum( x ) = type x", so if the sum overflows, user get negative result.

      Traditional SQL engines all expand the sum return type though in different ways.

      From Julian Hyde, Aug 11:

      Postgres return type is "bigint for smallint or int arguments, numeric for bigint arguments, otherwise the same as the argument data type” [ http://docs.oracle.com/database/121/SQLRF/functions196.htm#i89126 ]

      SQL Server return type is int for tinyint, smallint or int; bigint for bigint [ http://public.dhe.ibm.com/ps/products/db2/info/vr101/pdf/en_US/DB2SQLRefVol1-db2s1e1011.pdf ].

      I can see your point that “User demands the correct sum result”. But I’d also be pissed with Postgres if it returned a numeric (arbitrary precision) result when I am summing a bigint value. So I don’t think we’re going to please everyone.

      I think the solution is to add the policy to derive SUM’s return type to as a new method to RelDataTypeSystem. Then Kylin can supply its own.

      1. CALCITE-845_2.patch
        9 kB
        Maryann Xue
      2. CALCITE-845.patch
        10 kB
        Maryann Xue

        Activity

        Hide
        jamestaylor James Taylor added a comment -

        Maryann Xue - this is the JIRA that will help make our SUM aggregation to work, since we coerce to a long any sum over integer numbers.

        Show
        jamestaylor James Taylor added a comment - Maryann Xue - this is the JIRA that will help make our SUM aggregation to work, since we coerce to a long any sum over integer numbers.
        Hide
        julianhyde Julian Hyde added a comment -

        Let me know if you need help instantiating a RelDataTypeSystem and getting Calcite to use that throughout.

        Adding methods to RelDataTypeSystem is not something we plan to do regularly but I think SUM is an important enough case. Maybe COUNT also.

        Show
        julianhyde Julian Hyde added a comment - Let me know if you need help instantiating a RelDataTypeSystem and getting Calcite to use that throughout. Adding methods to RelDataTypeSystem is not something we plan to do regularly but I think SUM is an important enough case. Maybe COUNT also.
        Hide
        maryannxue Maryann Xue added a comment -

        Thanks, Julian Hyde! I got SUM working in Phoenix/Calcite with this change. Could you please review the patch?

        Show
        maryannxue Maryann Xue added a comment - Thanks, Julian Hyde ! I got SUM working in Phoenix/Calcite with this change. Could you please review the patch?
        Hide
        julianhyde Julian Hyde added a comment - - edited

        Looks as if you put the method on RelDataTypeFactory rather than RelDataTypeSystem. I suggested the latter because it is difficult to control the creation of every RelDataTypeFactory during the preparation of a query, and I still think it is the right place.

        In the javadoc can you mention what the default rule is. I think it is that the type of SUM( x ) is the same as x.

        Show
        julianhyde Julian Hyde added a comment - - edited Looks as if you put the method on RelDataTypeFactory rather than RelDataTypeSystem. I suggested the latter because it is difficult to control the creation of every RelDataTypeFactory during the preparation of a query, and I still think it is the right place. In the javadoc can you mention what the default rule is. I think it is that the type of SUM( x ) is the same as x.
        Hide
        maryannxue Maryann Xue added a comment -

        I agree that plugging in custom RelDataTypeFactory would be difficult. Actually it took me to implement a sub-class of CalciteFactory just to get this into PhoenixCalciteDriver. So do you have any suggestion on the signature of "createSumAggFunctionTypeWithNullability()" as I move it into RelDataTypeSystem? or is it OK to leave it that way?

        Show
        maryannxue Maryann Xue added a comment - I agree that plugging in custom RelDataTypeFactory would be difficult. Actually it took me to implement a sub-class of CalciteFactory just to get this into PhoenixCalciteDriver. So do you have any suggestion on the signature of "createSumAggFunctionTypeWithNullability()" as I move it into RelDataTypeSystem? or is it OK to leave it that way?
        Hide
        maryannxue Maryann Xue added a comment -

        Just realized that I couldn't do the "create type" thing in RelDataTypeSystem. I instead tried with returning type name, say, "getSumAggFunctionTypeName(operandTypeName)", which did not allow the type system implementation to have control over precision or scale. Any way to get around this, Julian Hyde?

        Show
        maryannxue Maryann Xue added a comment - Just realized that I couldn't do the "create type" thing in RelDataTypeSystem. I instead tried with returning type name, say, "getSumAggFunctionTypeName(operandTypeName)", which did not allow the type system implementation to have control over precision or scale. Any way to get around this, Julian Hyde ?
        Hide
        julianhyde Julian Hyde added a comment - - edited

        I think

        RelDataType deriveSumType(RelDataTypeFactory typeFactory, RelDataType argumentType)

        would work. This adds a type-factory to your method, and takes a way the nullable flag. The caller (i.e. Calcite's validator) already knows the rules of nullability - e.g. that 'select sum( x ) from t group by g' should be not null and 'select sum( x ) from t' should be nullable - and there is no need for projects to override. After calling deriveSumType, the Calcite validator will call RelDataTypeFactory.createTypewithNullability to adjust nullability.

        Show
        julianhyde Julian Hyde added a comment - - edited I think RelDataType deriveSumType(RelDataTypeFactory typeFactory, RelDataType argumentType) would work. This adds a type-factory to your method, and takes a way the nullable flag. The caller (i.e. Calcite's validator) already knows the rules of nullability - e.g. that 'select sum( x ) from t group by g' should be not null and 'select sum( x ) from t' should be nullable - and there is no need for projects to override. After calling deriveSumType, the Calcite validator will call RelDataTypeFactory.createTypewithNullability to adjust nullability.
        Hide
        maryannxue Maryann Xue added a comment -

        Thanks for the advice, Julian Hyde! Patch updated.

        Show
        maryannxue Maryann Xue added a comment - Thanks for the advice, Julian Hyde ! Patch updated.
        Hide
        julianhyde Julian Hyde added a comment -
        Show
        julianhyde Julian Hyde added a comment - Fixed in http://git-wip-us.apache.org/repos/asf/calcite/commit/2fd8c5a4 . Thanks for the patch, Maryann Xue !
        Hide
        jcamachorodriguez Jesus Camacho Rodriguez added a comment -

        Resolved in release 1.5.0 (2015-11-10)

        Show
        jcamachorodriguez Jesus Camacho Rodriguez added a comment - Resolved in release 1.5.0 (2015-11-10)

          People

          • Assignee:
            julianhyde Julian Hyde
            Reporter:
            liyang.gmt8@gmail.com liyang
          • Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development