Uploaded image for project: 'Calcite'
  1. Calcite
  2. CALCITE-1931

Change the return type of RANK, NTILE and other aggregate functions

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.14.0
    • Component/s: core
    • Labels:
      None

      Description

      RANK, NTILE etc. functions should not always return INTEGER. For example, RANK() and PERCENT_RANK() both return INTEGER. For RANK(), it would be better to return BIGINT; for PERCENT_RANK(), we should actually return DOUBLE or FLOAT since it is a percent value. There are other rank functions that should return floating number instead of INTEGER.

        Activity

        Hide
        michaelmior Michael Mior added a comment -

        Resolved in release 1.14.0 (2017-10-01)

        Show
        michaelmior Michael Mior added a comment - Resolved in release 1.14.0 (2017-10-01)
        Show
        julianhyde Julian Hyde added a comment - Fixed in http://git-wip-us.apache.org/repos/asf/calcite/commit/54ed57f8 .
        Hide
        minjikim MinJi Kim added a comment -

        I have updated the pull request with some additional changes for you to review. Thanks!

        https://github.com/apache/calcite/pull/511

        Show
        minjikim MinJi Kim added a comment - I have updated the pull request with some additional changes for you to review. Thanks! https://github.com/apache/calcite/pull/511
        Hide
        julianhyde Julian Hyde added a comment -

        One more thing. Can you add a paragraph to the release notes (HISTORY.md). Something like this:

        Breaking change. Various aggregate functions that used to return INTEGER now return other types. RANK, DENSE_RANK and NTILE now return BIGINT; CUME_DIST and PERCENT_RANK now return DOUBLE.

        Show
        julianhyde Julian Hyde added a comment - One more thing. Can you add a paragraph to the release notes (HISTORY.md). Something like this: Breaking change. Various aggregate functions that used to return INTEGER now return other types. RANK, DENSE_RANK and NTILE now return BIGINT; CUME_DIST and PERCENT_RANK now return DOUBLE.
        Hide
        julianhyde Julian Hyde added a comment -

        That makes sense. Thanks for doing the research. Let me know when you have a commit ready for review.

        Show
        julianhyde Julian Hyde added a comment - That makes sense. Thanks for doing the research. Let me know when you have a commit ready for review.
        Hide
        minjikim MinJi Kim added a comment -

        I think CUME_DIST() and PERCENT_RANK() for example, should return floating point, since generally their return values are less than or equal to 1 (I looked at Oracle and MS SqlServer, and they both return fractional values and 1 for these functions). I changed the other functions (e.g. NTILE/RANK/DENSE_RANK) to return BIGINT rather than INT. For example, MS SqlServer returns BIGINT for NTILE/RANK/DENSE_RANK functions.

        I didn't think they were controversial since I don't think CUME_DIST() should return integers, otherwise all the values would be zero. But perhaps somebody is using that behavior somewhere. I can definitely add a way for users to specify return values of RANK functions as in CALCITE-845. I would probably just allow two policies, one for CUME_DIST/PERCENT_RANK and another for NTILE/RANK/DENSE_RANK (with the default being the new behavior as in this patch). Do you think that makes sense?

        Show
        minjikim MinJi Kim added a comment - I think CUME_DIST() and PERCENT_RANK() for example, should return floating point, since generally their return values are less than or equal to 1 (I looked at Oracle and MS SqlServer, and they both return fractional values and 1 for these functions). I changed the other functions (e.g. NTILE/RANK/DENSE_RANK) to return BIGINT rather than INT. For example, MS SqlServer returns BIGINT for NTILE/RANK/DENSE_RANK functions. I didn't think they were controversial since I don't think CUME_DIST() should return integers, otherwise all the values would be zero. But perhaps somebody is using that behavior somewhere. I can definitely add a way for users to specify return values of RANK functions as in CALCITE-845 . I would probably just allow two policies, one for CUME_DIST/PERCENT_RANK and another for NTILE/RANK/DENSE_RANK (with the default being the new behavior as in this patch). Do you think that makes sense?
        Hide
        julianhyde Julian Hyde added a comment -

        What does the SQL standard say about these functions' return types?

        Are these new return types completely un-controversial? If not, should we let people specify a policy, as we did in CALCITE-845?

        The extra arg to SqlRankFunction's constructor looks like a breaking change. Leave the current constructor, marked deprecated.

        Show
        julianhyde Julian Hyde added a comment - What does the SQL standard say about these functions' return types? Are these new return types completely un-controversial? If not, should we let people specify a policy, as we did in CALCITE-845 ? The extra arg to SqlRankFunction 's constructor looks like a breaking change. Leave the current constructor, marked deprecated.
        Hide
        minjikim MinJi Kim added a comment -

        Thanks again for reviewing this!

        https://github.com/apache/calcite/pull/511

        Show
        minjikim MinJi Kim added a comment - Thanks again for reviewing this! https://github.com/apache/calcite/pull/511

          People

          • Assignee:
            julianhyde Julian Hyde
            Reporter:
            minjikim MinJi Kim
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development