Uploaded image for project: 'Calcite'
  1. Calcite
  2. CALCITE-806

ROW_NUMBER should emit distinct values

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.4.0-incubating
    • Component/s: None
    • Labels:
      None

      Description

      ROW_NUMBER should emit distinct values even for rows that have identical sort keys, or if there is no sort key. (This is distinct from RANK and DENSE_RANK, which emit the same value for rows with the same sort key.)

        Issue Links

          Activity

          Hide
          julianhyde Julian Hyde added a comment -

          Fixed in pull request https://github.com/apache/incubator-calcite/pull/107. Vladimir Sitnikov, can you please review and commit.

          Show
          julianhyde Julian Hyde added a comment - Fixed in pull request https://github.com/apache/incubator-calcite/pull/107 . Vladimir Sitnikov , can you please review and commit.
          Hide
          vladimirsitnikov Vladimir Sitnikov added a comment -

          Julian Hyde, I do not agree with your analysis quite yet.

          My bet is parsing/sqltorel translation is wrong.

          `ROW_NUMBER() over ()` is translated to `ROW_NUMBER() over (BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)`.

          I'm puzzled what is the best way to solve that.

          I would suggest unconditionally rewrite `ROW_NUMBER`'s bounds to `ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW`.

          Show
          vladimirsitnikov Vladimir Sitnikov added a comment - Julian Hyde , I do not agree with your analysis quite yet. My bet is parsing/sqltorel translation is wrong. `ROW_NUMBER() over ()` is translated to `ROW_NUMBER() over (BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)`. I'm puzzled what is the best way to solve that. I would suggest unconditionally rewrite `ROW_NUMBER`'s bounds to `ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW`.
          Hide
          vladimirsitnikov Vladimir Sitnikov added a comment -

          Please check https://github.com/vlsi/incubator-calcite/commits/806-row-number.
          It just converts bounds as described above.

          I'm not sure which validation should be added if any.

          PS. I would like to avoid adding code to existing test methods. Having separate tests helps to isolate&debug things.

          Show
          vladimirsitnikov Vladimir Sitnikov added a comment - Please check https://github.com/vlsi/incubator-calcite/commits/806-row-number . It just converts bounds as described above. I'm not sure which validation should be added if any. PS. I would like to avoid adding code to existing test methods. Having separate tests helps to isolate&debug things.
          Hide
          julianhyde Julian Hyde added a comment -

          Does that mean that it is illegal to use ROW_NUMBER with any other kind of bound? Your change implies that

          ROW_NUMBER() OVER (ROWS BETWEEN 3 PRECEDING AND 2 FOLLOWING)

          should be legal but would give the same result as

          ROW_NUMBER() OVER ()

          Is that right?

          Does your patch work under both JDK 1.7 and 1.8? I had to change my code because the inherent non-determinism gave different results. You modified my code and lost my later changes.

          Show
          julianhyde Julian Hyde added a comment - Does that mean that it is illegal to use ROW_NUMBER with any other kind of bound? Your change implies that ROW_NUMBER() OVER (ROWS BETWEEN 3 PRECEDING AND 2 FOLLOWING) should be legal but would give the same result as ROW_NUMBER() OVER () Is that right? Does your patch work under both JDK 1.7 and 1.8? I had to change my code because the inherent non-determinism gave different results. You modified my code and lost my later changes.
          Hide
          vladimirsitnikov Vladimir Sitnikov added a comment -

          I don't like discussing the spec, however it looks like we don't have a choice.

          Does that mean that it is illegal to use ROW_NUMBER with any other kind of bound?

          This is exactly how I read the following in the spec:

          If <rank function type> or ROW_NUMBER is specified, then:
          b) The window framing of WDX shall not be present.

          In other words, all rank functions+row_number cannot have framing.

          I believe we have a validation exactly for this reason:
          RankWithFrame=ROW/RANGE not allowed with RANK or DENSE_RANK functions
          Well, it makes sense adding ROW_NUMBER there as well (or parametrize the error message with actual name of the function).

          Ok, we continue:

          ROW_NUMBER() OVER WNS is equivalent to the <window function>: COUNT (*) OVER (WNS1 ROWS UNBOUNDED PRECEDING)

          I read this as follows: whenever you see ROW_NUMBER() OVER WNS, you take window specification (remember, it does not contain framing), overlay it with ROWS UNBOUNDED PRECEDING, and replace ROW_NUMBER with COUNT(*).

          PostgreSQL just silently ignores ROWS BETWEEN 3 PRECEDING AND 2 FOLLOWING in row_number/rank/dense_rank.
          Oracle does not allow framing for rank/row_number.

          Show
          vladimirsitnikov Vladimir Sitnikov added a comment - I don't like discussing the spec, however it looks like we don't have a choice. Does that mean that it is illegal to use ROW_NUMBER with any other kind of bound? This is exactly how I read the following in the spec: If <rank function type> or ROW_NUMBER is specified, then: b) The window framing of WDX shall not be present. In other words, all rank functions+row_number cannot have framing. I believe we have a validation exactly for this reason: RankWithFrame=ROW/RANGE not allowed with RANK or DENSE_RANK functions Well, it makes sense adding ROW_NUMBER there as well (or parametrize the error message with actual name of the function). Ok, we continue: ROW_NUMBER() OVER WNS is equivalent to the <window function>: COUNT (*) OVER (WNS1 ROWS UNBOUNDED PRECEDING) I read this as follows: whenever you see ROW_NUMBER() OVER WNS , you take window specification (remember, it does not contain framing), overlay it with ROWS UNBOUNDED PRECEDING , and replace ROW_NUMBER with COUNT(*) . PostgreSQL just silently ignores ROWS BETWEEN 3 PRECEDING AND 2 FOLLOWING in row_number/rank/dense_rank. Oracle does not allow framing for rank/row_number.
          Hide
          julianhyde Julian Hyde added a comment -

          That makes sense. Do you think we should have a test that

          ROWS BETWEEN 3 PRECEDING AND 2 FOLLOWING

          is silently ignored?

          I think we're almost there. I'm going to revise the pull request, combining your changes and my changes. I want to get this in by Wednesday.

          Show
          julianhyde Julian Hyde added a comment - That makes sense. Do you think we should have a test that ROWS BETWEEN 3 PRECEDING AND 2 FOLLOWING is silently ignored? I think we're almost there. I'm going to revise the pull request, combining your changes and my changes. I want to get this in by Wednesday.
          Hide
          vladimirsitnikov Vladimir Sitnikov added a comment -

          ROWS BETWEEN 3 PRECEDING AND 2 FOLLOWING is silently ignored?

          I would rather follow the spec and throw a validation error in such case.
          Why should we accept framing here? The spec does not allow it. It does not make much sense, so I vote for throwing.

          Show
          vladimirsitnikov Vladimir Sitnikov added a comment - ROWS BETWEEN 3 PRECEDING AND 2 FOLLOWING is silently ignored? I would rather follow the spec and throw a validation error in such case. Why should we accept framing here? The spec does not allow it. It does not make much sense, so I vote for throwing.
          Hide
          julianhyde Julian Hyde added a comment -

          That's fine. Whatever behavior we choose, we should test it.

          Show
          julianhyde Julian Hyde added a comment - That's fine. Whatever behavior we choose, we should test it.
          Show
          julianhyde Julian Hyde added a comment - Vladimir Sitnikov , Please see latest https://github.com/apache/incubator-calcite/pull/107 , https://github.com/julianhyde/incubator-calcite/commit/df582617135314ea6348de9142e4f265d70c9f96 . If you +1 I will squash and commit to master.
          Hide
          julianhyde Julian Hyde added a comment -

          Vladimir Sitnikov, Do you need more time to review? Or shall I just go ahead and check in?

          Show
          julianhyde Julian Hyde added a comment - Vladimir Sitnikov , Do you need more time to review? Or shall I just go ahead and check in?
          Show
          vladimirsitnikov Vladimir Sitnikov added a comment - Looks fine, tests pass. Committed as https://git1-us-west.apache.org/repos/asf?p=incubator-calcite.git;a=commit;h=096d2820b6a3
          Hide
          julianhyde Julian Hyde added a comment -

          Thanks!

          Show
          julianhyde Julian Hyde added a comment - Thanks!
          Hide
          jnadeau Jacques Nadeau added a comment -

          Resolved in release 1.4.0-incubating (2015-08-23)

          Show
          jnadeau Jacques Nadeau added a comment - Resolved in release 1.4.0-incubating (2015-08-23)

            People

            • Assignee:
              julianhyde Julian Hyde
              Reporter:
              julianhyde Julian Hyde
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development