Uploaded image for project: 'Calcite'
  1. Calcite
  2. CALCITE-841

Redundant windows when window function arguments are expressions

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.5.0
    • Component/s: None
    • Labels:
      None

      Description

      Firstly of all, this issue happens when HepPlanner is used with ProjectToWindowRule.PROJECT rule.

      A query with this pattern:

      select fn(col) over w, fn(expr) over w
      from ...
      

      will generate two "LogicalWindow" even if we have only an identical window frame.

      For example,

      select sum(deptno) over(partition by deptno order by sal) as sum1, 
      sum(deptno + deptno) over(partition by deptno order by sal) as sum2
      from emp
      

      gives:

      LogicalProject($0=[$2], $1=[$4])
        LogicalWindow(window#0=[window(partition {1} order by [0] range between UNBOUNDED PRECEDING and CURRENT ROW aggs [SUM($3)])])
          LogicalProject(SAL=[$0], DEPTNO=[$1], $2=[$2], $3=[+($1, $1)])
            LogicalProject(SAL=[$5], DEPTNO=[$7], $2=[$9])
              LogicalWindow(window#0=[window(partition {7} order by [5] range between UNBOUNDED PRECEDING and CURRENT ROW aggs [SUM($7)])])
                LogicalTableScan(table=[[CATALOG, SALES, EMP]])
      

        Activity

        Hide
        seanhychu Sean Hsuan-Yi Chu added a comment - - edited

        Interestingly, for the example provided above, if you flip the order in the select list, there would be only one LogicalWindow:

        LogicalProject($0=[$3], $1=[$4])
          LogicalWindow(window#0=[window(partition {1} order by [0] range between UNBOUNDED PRECEDING and CURRENT ROW aggs [SUM($2), SUM($1)])])
            LogicalProject(SAL=[$5], DEPTNO=[$7], $2=[+($7, $7)])
              LogicalTableScan(table=[[CATALOG, SALES, EMP]])
        
        Show
        seanhychu Sean Hsuan-Yi Chu added a comment - - edited Interestingly, for the example provided above, if you flip the order in the select list, there would be only one LogicalWindow: LogicalProject($0=[$3], $1=[$4]) LogicalWindow(window#0=[window(partition {1} order by [0] range between UNBOUNDED PRECEDING and CURRENT ROW aggs [SUM($2), SUM($1)])]) LogicalProject(SAL=[$5], DEPTNO=[$7], $2=[+($7, $7)]) LogicalTableScan(table=[[CATALOG, SALES, EMP]])
        Show
        seanhychu Sean Hsuan-Yi Chu added a comment - Pull request: https://github.com/apache/incubator-calcite/pull/120
        Hide
        amansinha100 Aman Sinha added a comment -

        Changes look ok to me. I made a suggestion for the test case.

        Show
        amansinha100 Aman Sinha added a comment - Changes look ok to me. I made a suggestion for the test case.
        Hide
        seanhychu Sean Hsuan-Yi Chu added a comment -

        Already addressed that in the pull request. Thanks for the feedback.

        Show
        seanhychu Sean Hsuan-Yi Chu added a comment - Already addressed that in the pull request. Thanks for the feedback.
        Hide
        seanhychu Sean Hsuan-Yi Chu added a comment - - edited

        Julian Hyde Can you help us review? Thanks

        Basically, out idea is to enforce a more restricted order in the Topological Sort so that less levels are needed to be generated.

        Show
        seanhychu Sean Hsuan-Yi Chu added a comment - - edited Julian Hyde Can you help us review? Thanks Basically, out idea is to enforce a more restricted order in the Topological Sort so that less levels are needed to be generated.
        Hide
        julianhyde Julian Hyde added a comment -

        You have a problem with a rule so are fixing the engine; and to fix the engine you fix a general-purpose algorithm (topological sort) which was not producing wrong results in the first place. Doesn't that seem wrong?

        I'm not sure where the real problem is, but I'm pretty sure it would help a lot if there were something like a WindowMergeRule. Then you would not be so sensitive to the order in which rules get fired. (Since these rules are supposed to work on multiple engines, you must not be too sensitive.)

        If after that you still want to fix TopologicalOrderIterator, let's do it by making "empties" a priority queue and you can pass in a comparator to the constructor. (In your case the comparator would sort based on the number of outgoing edges.) That would also fix the O(n ^ 2) running cost of your algorithm.

        Show
        julianhyde Julian Hyde added a comment - You have a problem with a rule so are fixing the engine; and to fix the engine you fix a general-purpose algorithm (topological sort) which was not producing wrong results in the first place. Doesn't that seem wrong? I'm not sure where the real problem is, but I'm pretty sure it would help a lot if there were something like a WindowMergeRule. Then you would not be so sensitive to the order in which rules get fired. (Since these rules are supposed to work on multiple engines, you must not be too sensitive.) If after that you still want to fix TopologicalOrderIterator, let's do it by making "empties" a priority queue and you can pass in a comparator to the constructor. (In your case the comparator would sort based on the number of outgoing edges.) That would also fix the O(n ^ 2) running cost of your algorithm.
        Hide
        seanhychu Sean Hsuan-Yi Chu added a comment -

        I think the problem is the "order of evaluation" suggested by the Topological Sort.

        In the given example, we have the following following expressions:

        (1). sum(deptno)
        (2). sum(deptno + deptno)
        (3). deptno + deptno
        

        Note that (1) and (2) correspond LogicalWindow and (3) corresponds to a LogicalProject.

        Apparently, (3) needs to be evaluated before (2). And, there is no dependency between (1) and (2). The "current" topological sort will give this order: (1), (3), (2).

        Although it is a correct topological order in the mathematical sense, this order actually suggests inserting a Project between Two Windows and splits the two LogicalWindow apart. However, if the order (3), (1), (2) is chosen, then, given the same window frame, CalcRelSplitter would not split the LogicalWindow.

        On the other hand, though the WindowMergeRule (and PushProjectOverWindowRule) would probably help resolve this issue, the LogicalWindow should not be split apart by CalcRelSplitter at the first place. In this issue, the root cause seems in CalcRelSplitter itself

        Show
        seanhychu Sean Hsuan-Yi Chu added a comment - I think the problem is the "order of evaluation" suggested by the Topological Sort. In the given example, we have the following following expressions: (1). sum(deptno) (2). sum(deptno + deptno) (3). deptno + deptno Note that (1) and (2) correspond LogicalWindow and (3) corresponds to a LogicalProject. Apparently, (3) needs to be evaluated before (2). And, there is no dependency between (1) and (2). The "current" topological sort will give this order: (1), (3), (2). Although it is a correct topological order in the mathematical sense, this order actually suggests inserting a Project between Two Windows and splits the two LogicalWindow apart. However, if the order (3), (1), (2) is chosen, then, given the same window frame, CalcRelSplitter would not split the LogicalWindow. On the other hand, though the WindowMergeRule (and PushProjectOverWindowRule) would probably help resolve this issue, the LogicalWindow should not be split apart by CalcRelSplitter at the first place. In this issue, the root cause seems in CalcRelSplitter itself
        Hide
        julianhyde Julian Hyde added a comment -

        the root cause seems in CalcRelSplitter itself

        I agree. There are many links in this chain (topological iterator, hep engine, CalcRelSplitter, lack of a rule to merge windows, and the rule that converts a window or windows to a physical plan) but CalcRelSplitter is probably the most important. Fixing CalcRelSplitter would fix this problem in both Hep and Volcano planners.

        Fixing topological iterator would be arbitrary – before this problem occurred, neither of us would have looked at topological iterator or even HepPlanner and concluded that it was doing the wrong thing. If we were to change it there is a good chance that the new order will be "wrong" for another case.

        Show
        julianhyde Julian Hyde added a comment - the root cause seems in CalcRelSplitter itself I agree. There are many links in this chain (topological iterator, hep engine, CalcRelSplitter, lack of a rule to merge windows, and the rule that converts a window or windows to a physical plan) but CalcRelSplitter is probably the most important. Fixing CalcRelSplitter would fix this problem in both Hep and Volcano planners. Fixing topological iterator would be arbitrary – before this problem occurred, neither of us would have looked at topological iterator or even HepPlanner and concluded that it was doing the wrong thing. If we were to change it there is a good chance that the new order will be "wrong" for another case.
        Hide
        seanhychu Sean Hsuan-Yi Chu added a comment -

        If we would like to fix in CalcRelSplitter, using "getCohort()" might be one way we could do.

        This method helps "place" specified expressions on the same level.

        Back to this JIRA issue, isn't it what we are asking for?

        I had an implementation here:
        https://github.com/hsuanyi/incubator-calcite/tree/CALCITE-841NEW

        Vladimir Sitnikov I saw you leave some comments/hints in the source code.

        Basically, I coded along that. So can you help pick up any potential issue ?

        Show
        seanhychu Sean Hsuan-Yi Chu added a comment - If we would like to fix in CalcRelSplitter, using "getCohort()" might be one way we could do. This method helps "place" specified expressions on the same level. Back to this JIRA issue, isn't it what we are asking for? I had an implementation here: https://github.com/hsuanyi/incubator-calcite/tree/CALCITE-841NEW Vladimir Sitnikov I saw you leave some comments/hints in the source code. Basically, I coded along that. So can you help pick up any potential issue ?
        Hide
        julianhyde Julian Hyde added a comment -

        I understand now. Changing how the cohorts are generated sounds perfect.

        (I thought that you were trying to change the order that HepPlanner fires rules. You can see why I misunderstood: both HepPlanner and CalcRelSplitter would be affected by changes to the topological sort, your patch does not touch either HepPlanner or CalcRelSplitter, and you only mention HepPlanner in your discussion. If you had added a parameter to TopologicalOrderIterator it would have been clear which component you were trying to change.)

        I took a quick look at your patch, and it looks good. I will review and commit when I have some more time.

        Your algorithm seems to be doing something like a topological sort. Did you consider using TopologicalOrderIterator (maybe with a Comparator to settle ties) for this?

        Show
        julianhyde Julian Hyde added a comment - I understand now. Changing how the cohorts are generated sounds perfect. (I thought that you were trying to change the order that HepPlanner fires rules. You can see why I misunderstood: both HepPlanner and CalcRelSplitter would be affected by changes to the topological sort, your patch does not touch either HepPlanner or CalcRelSplitter, and you only mention HepPlanner in your discussion. If you had added a parameter to TopologicalOrderIterator it would have been clear which component you were trying to change.) I took a quick look at your patch, and it looks good. I will review and commit when I have some more time. Your algorithm seems to be doing something like a topological sort. Did you consider using TopologicalOrderIterator (maybe with a Comparator to settle ties) for this?
        Hide
        seanhychu Sean Hsuan-Yi Chu added a comment -

        Thanks for the prompt response
        Yes, we will try to improve there.

        Show
        seanhychu Sean Hsuan-Yi Chu added a comment - Thanks for the prompt response Yes, we will try to improve there.
        Hide
        seanhychu Sean Hsuan-Yi Chu added a comment -

        Thanks for the prompt response
        Yes, we will try to improve there.

        Show
        seanhychu Sean Hsuan-Yi Chu added a comment - Thanks for the prompt response Yes, we will try to improve there.
        Hide
        seanhychu Sean Hsuan-Yi Chu added a comment -
        Show
        seanhychu Sean Hsuan-Yi Chu added a comment - Sent out a pull request: https://github.com/apache/incubator-calcite/pull/124
        Hide
        seanhychu Sean Hsuan-Yi Chu added a comment -

        Julian Hyde Can you help review? Thanks!

        Show
        seanhychu Sean Hsuan-Yi Chu added a comment - Julian Hyde Can you help review? Thanks!
        Show
        julianhyde Julian Hyde added a comment - Fixed in http://git-wip-us.apache.org/repos/asf/incubator-calcite/commit/c1fb8299 . Thanks for the patch, Sean Hsuan-Yi Chu !
        Hide
        jcamachorodriguez Jesus Camacho Rodriguez added a comment -

        Resolved in release 1.5.0 (2015-11-10)

        Show
        jcamachorodriguez Jesus Camacho Rodriguez added a comment - Resolved in release 1.5.0 (2015-11-10)

          People

          • Assignee:
            julianhyde Julian Hyde
            Reporter:
            seanhychu Sean Hsuan-Yi Chu
          • Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development