Uploaded image for project: 'Calcite'
  1. Calcite
  2. CALCITE-800

Window function defined within another window function should be invalid

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.4.0-incubating
    • Component/s: None
    • Labels:
      None

      Description

      For instance,

      select sum(deptno) over (order by 
      sum(deptno) over(order by deptno)) 
      from emp
      

      This is an invalid query. However, it passes the validation.

        Issue Links

          Activity

          Hide
          julianhyde Julian Hyde added a comment -

          Agreed, it is invalid. The validator should give an error (not the parser).

          Show
          julianhyde Julian Hyde added a comment - Agreed, it is invalid. The validator should give an error (not the parser).
          Hide
          seanhychu Sean Hsuan-Yi Chu added a comment -

          In SqlValidatorTest.testWindowClause, we have a positive test case :
          "window w as (order by rank() over (order by sal))"

          Is this window is invalid? We tried a similar thing is postgres:

          postgres=# select rank() over(order by row_number() over(order by XXX)) from YYY;
          ERROR:  window functions are not allowed in window definitions
          

          Is it alright to remove that test case ? Otherwise, our patch will always break that one.

          Show
          seanhychu Sean Hsuan-Yi Chu added a comment - In SqlValidatorTest.testWindowClause, we have a positive test case : "window w as (order by rank() over (order by sal))" Is this window is invalid? We tried a similar thing is postgres: postgres=# select rank() over(order by row_number() over(order by XXX)) from YYY; ERROR: window functions are not allowed in window definitions Is it alright to remove that test case ? Otherwise, our patch will always break that one.
          Hide
          julianhyde Julian Hyde added a comment -

          "window w as (order by rank() over (order by sal))" is invalid. Can you please keep the test case, but convert it to a negative case.

          The "syntax rule 7" referred to in the comment is as follows:

          Section 6.10 <window function>
          7) If <ntile function>, <lead or lag function>, <rank function type> or ROW_NUMBER is specified, then:
          a) If <ntile function>, <lead or lag function>, RANK or DENSE_RANK is specified, then the window
          ordering clause WOC of WDX shall be present.
          b) The window framing clause of WDX shall not be present.
          ...
          

          Since you're modifying a test case, can you make sure there are test cases that:
          (1) it's an error if RANK does not have ORDER BY,
          (2) it's an error if RANK has RANGE or ROWS,
          (3) it's ok if ROW_NUMBER does not have ORDER BY,
          (4) it's ok if ROW_NUMBER does have ORDER BY,
          (5) it's an error if ROW_NUMBER has RANGE or ROWS.

          Show
          julianhyde Julian Hyde added a comment - "window w as (order by rank() over (order by sal))" is invalid. Can you please keep the test case, but convert it to a negative case. The "syntax rule 7" referred to in the comment is as follows: Section 6.10 <window function> 7) If <ntile function>, <lead or lag function>, <rank function type> or ROW_NUMBER is specified, then: a) If <ntile function>, <lead or lag function>, RANK or DENSE_RANK is specified, then the window ordering clause WOC of WDX shall be present. b) The window framing clause of WDX shall not be present. ... Since you're modifying a test case, can you make sure there are test cases that: (1) it's an error if RANK does not have ORDER BY, (2) it's an error if RANK has RANGE or ROWS, (3) it's ok if ROW_NUMBER does not have ORDER BY, (4) it's ok if ROW_NUMBER does have ORDER BY, (5) it's an error if ROW_NUMBER has RANGE or ROWS.
          Hide
          seanhychu Sean Hsuan-Yi Chu added a comment -

          I will make it a negative case. Besides, for the addition cases you listed:
          (1). In SqlValidatorTest.testWindowFunctions2(), rule 6a includes this.
          (2). In SqlValidatorTest.testWindowFunctions2(), rule 6b includes this.
          (3). In SqlValidatorTest.testWindowFunctions2(), there is one.
          *(4). Did you mean "partition by"? I think ROW_NUMBER requires ORDER BY (oracle: http://docs.oracle.com/cd/B28359_01/server.111/b28286/functions144.htm#SQLRF06100).
          (5). Added one

          Show
          seanhychu Sean Hsuan-Yi Chu added a comment - I will make it a negative case. Besides, for the addition cases you listed: (1). In SqlValidatorTest.testWindowFunctions2(), rule 6a includes this. (2). In SqlValidatorTest.testWindowFunctions2(), rule 6b includes this. (3). In SqlValidatorTest.testWindowFunctions2(), there is one. *(4). Did you mean "partition by"? I think ROW_NUMBER requires ORDER BY (oracle: http://docs.oracle.com/cd/B28359_01/server.111/b28286/functions144.htm#SQLRF06100 ). (5). Added one
          Show
          seanhychu Sean Hsuan-Yi Chu added a comment - https://github.com/apache/incubator-calcite/pull/106
          Hide
          seanhychu Sean Hsuan-Yi Chu added a comment -

          Julian Hyde, can you help review?

          Show
          seanhychu Sean Hsuan-Yi Chu added a comment - Julian Hyde , can you help review?
          Hide
          julianhyde Julian Hyde added a comment -

          Re (4). Read the spec that I pasted above. ROW_NUMBER does not require ORDER BY.

          Show
          julianhyde Julian Hyde added a comment - Re (4). Read the spec that I pasted above. ROW_NUMBER does not require ORDER BY.
          Hide
          seanhychu Sean Hsuan-Yi Chu added a comment -
          Show
          seanhychu Sean Hsuan-Yi Chu added a comment - Addressed in the pull request. https://github.com/apache/incubator-calcite/pull/106
          Hide
          vladimirsitnikov Vladimir Sitnikov added a comment - - edited

          Regarding (4)

          4.1) It is likely an application bug if it generates row_number without order by. That does not make much sense in allowing such queries in.
          4.2)
          Quote from SQL2003:

          The ROW_NUMBER function computes the sequential row number, starting with 1 (one) for the first row, of the row within its window partition according to the window ordering of the window.

          This assumes "window ordering" exists. Doesn't it?

          4.3)
          Quote from SQL2003:

          ROW_NUMBER() OVER WNS is equivalent to the <window function>: COUNT (*) OVER (WNS1 ROWS UNBOUNDED PRECEDING)

          When I issue count(*) over (rows unbounded preceding) PostgreSQL returns some different values, so it works with it's own understanding of ordering. That means that is at least PG bug.

          I do not find yet in the spec, however I believe it should require specification of ordering if you provide framing.
          Well, there is "If the window ordering clause of a window structure descriptor is absent, then the window ordering is
          implementation-dependent.", however that is not a good point to "enable row_number without order by".

          Show
          vladimirsitnikov Vladimir Sitnikov added a comment - - edited Regarding (4) 4.1) It is likely an application bug if it generates row_number without order by. That does not make much sense in allowing such queries in. 4.2) Quote from SQL2003: The ROW_NUMBER function computes the sequential row number, starting with 1 (one) for the first row, of the row within its window partition according to the window ordering of the window. This assumes "window ordering" exists. Doesn't it? 4.3) Quote from SQL2003: ROW_NUMBER() OVER WNS is equivalent to the <window function>: COUNT (*) OVER (WNS1 ROWS UNBOUNDED PRECEDING) When I issue count(*) over (rows unbounded preceding) PostgreSQL returns some different values, so it works with it's own understanding of ordering. That means that is at least PG bug. I do not find yet in the spec, however I believe it should require specification of ordering if you provide framing. Well, there is "If the window ordering clause of a window structure descriptor is absent, then the window ordering is implementation-dependent.", however that is not a good point to "enable row_number without order by".
          Hide
          julianhyde Julian Hyde added a comment -

          You might say the same thing about LIMIT without ORDER BY. Yet the standard allows it.

          Is the standard not clear? Or do you just not agree with it?

          Show
          julianhyde Julian Hyde added a comment - You might say the same thing about LIMIT without ORDER BY. Yet the standard allows it. Is the standard not clear? Or do you just not agree with it?
          Hide
          vladimirsitnikov Vladimir Sitnikov added a comment -

          The standard is not clear – see my 4.2.

          Show
          vladimirsitnikov Vladimir Sitnikov added a comment - The standard is not clear – see my 4.2.
          Hide
          julianhyde Julian Hyde added a comment - - edited

          You've quoted from the "Concepts" section. About the Concepts it says:

          This Clause describes concepts that are, for the most part, specified precisely in other parts of ISO/IEC 9075. In any case of discrepancy, the specification in the other part is to be presumed correct.

          Elsewhere it also says

          NOTE 299 — Although ROW_NUMBER is non-deterministic, ...

          The intent is pretty clear. ROW_NUMBER is non-deterministic. In case of a tie, it assigns sequence numbers to the tied elements in arbitrary order.

          (That is the only difference between it and RANK. "RANK ... ORDER BY ()" would return 1 for all elements and therefore be useless; "ROW_NUMBER ... ORDER BY ()" or equivalently "ROW_NUMBER" assigns unique sequence numbers, and that is useful. For example you can use it to delete all but one copy of duplicate rows in a table. And it sounds as if Postgres botched it, and implemented ROW_NUMBER as if it were RANK.)

          Quite a few parts of SQL are non-deterministic. I mentioned LIMIT without ORDER BY earlier. And even ORDER BY is non-deterministic if the order key is not unique.

          Can we wind up this discussion now please? I'd rather see a patch with an implementation then spend more time arguing like lawyers or religious scholars.

          Show
          julianhyde Julian Hyde added a comment - - edited You've quoted from the "Concepts" section. About the Concepts it says: This Clause describes concepts that are, for the most part, specified precisely in other parts of ISO/IEC 9075. In any case of discrepancy, the specification in the other part is to be presumed correct. Elsewhere it also says NOTE 299 — Although ROW_NUMBER is non-deterministic, ... The intent is pretty clear. ROW_NUMBER is non-deterministic. In case of a tie, it assigns sequence numbers to the tied elements in arbitrary order. (That is the only difference between it and RANK. "RANK ... ORDER BY ()" would return 1 for all elements and therefore be useless; "ROW_NUMBER ... ORDER BY ()" or equivalently "ROW_NUMBER" assigns unique sequence numbers, and that is useful. For example you can use it to delete all but one copy of duplicate rows in a table. And it sounds as if Postgres botched it, and implemented ROW_NUMBER as if it were RANK.) Quite a few parts of SQL are non-deterministic. I mentioned LIMIT without ORDER BY earlier. And even ORDER BY is non-deterministic if the order key is not unique. Can we wind up this discussion now please? I'd rather see a patch with an implementation then spend more time arguing like lawyers or religious scholars.
          Hide
          julianhyde Julian Hyde added a comment -

          Fixed in http://git-wip-us.apache.org/repos/asf/incubator-calcite/commit/ef5833f3. (I don't mean the discussion is over – although I hope it is – but the original issue, namely the validation of window functions, is fixed.) Thanks for the patch, Sean Hsuan-Yi Chu!

          Show
          julianhyde Julian Hyde added a comment - Fixed in http://git-wip-us.apache.org/repos/asf/incubator-calcite/commit/ef5833f3 . (I don't mean the discussion is over – although I hope it is – but the original issue, namely the validation of window functions, is fixed.) Thanks for the patch, Sean Hsuan-Yi Chu !
          Hide
          seanhychu Sean Hsuan-Yi Chu added a comment -

          Thanks for the pointers!

          Show
          seanhychu Sean Hsuan-Yi Chu added a comment - Thanks for the pointers!
          Hide
          seanhychu Sean Hsuan-Yi Chu added a comment -

          Thanks for the pointers!

          Show
          seanhychu Sean Hsuan-Yi Chu added a comment - Thanks for the pointers!
          Hide
          seanhychu Sean Hsuan-Yi Chu added a comment -

          Thanks for the pointers!

          Show
          seanhychu Sean Hsuan-Yi Chu added a comment - Thanks for the pointers!
          Hide
          seanhychu Sean Hsuan-Yi Chu added a comment -

          Thanks for the pointers!

          Show
          seanhychu Sean Hsuan-Yi Chu added a comment - Thanks for the pointers!
          Hide
          vladimirsitnikov Vladimir Sitnikov added a comment -

          Can we wind up this discussion now please?

          OK. Thanks for the clarification.

          Show
          vladimirsitnikov Vladimir Sitnikov added a comment - Can we wind up this discussion now please? OK. Thanks for the clarification.
          Hide
          jnadeau Jacques Nadeau added a comment -

          Resolved in release 1.4.0-incubating (2015-08-23)

          Show
          jnadeau Jacques Nadeau added a comment - Resolved in release 1.4.0-incubating (2015-08-23)

            People

            • Assignee:
              julianhyde Julian Hyde
              Reporter:
              seanhychu Sean Hsuan-Yi Chu
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development