Details

    • Type: Sub-task
    • Status: Closed
    • Priority: Critical
    • Resolution: Done
    • Affects Version/s: 1.3.0
    • Fix Version/s: 1.3.0
    • Component/s: Table API & SQL
    • Labels:
      None

      Description

      The OVER windows have been implemented by several contributors.
      We should do a post pass over the contributed code and improve a few things.

      • Functionality
        • Currently every time attribute is allowed as ORDER BY attribute. We must check that this is actually a time indicator (procTime(), rowTime()) and that the order is ASCENDING.
      • Documentation
        • Add documentation for OVER windows
      • Code style
        • Consistent naming of ProcessFunctions and methods
      • Tests
        • Move the OVER window tests out of SqlITCase into a dedicated class
        • Move the OVER window tests out of WindowAggregateTest into a dedicated class
        • Add tests based on the test harness for all ProcessFunctions similar to BoundedProcessingOverRangeProcessFunction. The tests should include exact boundary checks for range windows and check for proper parallelization with multiple keys.

        Issue Links

          Activity

          Hide
          fhueske Fabian Hueske added a comment -

          Done with 9f2293cfdab960246fe1aea1d705eea18a011761

          Show
          fhueske Fabian Hueske added a comment - Done with 9f2293cfdab960246fe1aea1d705eea18a011761
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user asfgit closed the pull request at:

          https://github.com/apache/flink/pull/3697

          Show
          githubbot ASF GitHub Bot added a comment - Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/3697
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user sunjincheng121 commented on the issue:

          https://github.com/apache/flink/pull/3697

          Rebase code from master.

          Show
          githubbot ASF GitHub Bot added a comment - Github user sunjincheng121 commented on the issue: https://github.com/apache/flink/pull/3697 Rebase code from master.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user sunjincheng121 commented on the issue:

          https://github.com/apache/flink/pull/3697

          @zentol Thank you for the reminder. And I added some change messages.
          Thanks,
          SunJincheng

          Show
          githubbot ASF GitHub Bot added a comment - Github user sunjincheng121 commented on the issue: https://github.com/apache/flink/pull/3697 @zentol Thank you for the reminder. And I added some change messages. Thanks, SunJincheng
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user zentol commented on the issue:

          https://github.com/apache/flink/pull/3697

          Please provide some description as to the changes that you made, it's a 2000+ line diff after all.

          Show
          githubbot ASF GitHub Bot added a comment - Github user zentol commented on the issue: https://github.com/apache/flink/pull/3697 Please provide some description as to the changes that you made, it's a 2000+ line diff after all.
          Hide
          githubbot ASF GitHub Bot added a comment -

          GitHub user sunjincheng121 opened a pull request:

          https://github.com/apache/flink/pull/3697

          FLINK-6257[table]Optimize test cases

          Thanks for contributing to Apache Flink. Before you open your pull request, please take the following check list into consideration.
          If your changes take all of the items into account, feel free to open your pull request. For more information and/or questions please refer to the [How To Contribute guide](http://flink.apache.org/how-to-contribute.html).
          In addition to going through the list, please provide a meaningful description of your changes.

          • [] General
          • The pull request references the related JIRA issue ("FLINK-6257[table]Optimize test cases")
          • The pull request addresses only one issue
          • Each commit in the PR has a meaningful commit message (including the JIRA id)
          • [ ] Documentation
          • Documentation has been added for new functionality
          • Old documentation affected by the pull request has been updated
          • JavaDoc for public methods has been added
          • [×] Tests & Build
          • Functionality added by the pull request is covered by tests
          • `mvn clean verify` has been executed successfully locally or a Travis build has passed

          You can merge this pull request into a Git repository by running:

          $ git pull https://github.com/sunjincheng121/flink FLINK-6257-PR-Tests

          Alternatively you can review and apply these changes as the patch at:

          https://github.com/apache/flink/pull/3697.patch

          To close this pull request, make a commit to your master/trunk branch
          with (at least) the following in the commit message:

          This closes #3697


          commit 4ff7af2ea11eb78a945243d234b0a1a6b4624fcc
          Author: sunjincheng121 <sunjincheng121@gmail.com>
          Date: 2017-04-06T07:33:50Z

          FLINK-6257[table]Optimize test cases


          Show
          githubbot ASF GitHub Bot added a comment - GitHub user sunjincheng121 opened a pull request: https://github.com/apache/flink/pull/3697 FLINK-6257 [table] Optimize test cases Thanks for contributing to Apache Flink. Before you open your pull request, please take the following check list into consideration. If your changes take all of the items into account, feel free to open your pull request. For more information and/or questions please refer to the [How To Contribute guide] ( http://flink.apache.org/how-to-contribute.html ). In addition to going through the list, please provide a meaningful description of your changes. [] General The pull request references the related JIRA issue (" FLINK-6257 [table] Optimize test cases") The pull request addresses only one issue Each commit in the PR has a meaningful commit message (including the JIRA id) [ ] Documentation Documentation has been added for new functionality Old documentation affected by the pull request has been updated JavaDoc for public methods has been added [×] Tests & Build Functionality added by the pull request is covered by tests `mvn clean verify` has been executed successfully locally or a Travis build has passed You can merge this pull request into a Git repository by running: $ git pull https://github.com/sunjincheng121/flink FLINK-6257 -PR-Tests Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/3697.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3697 commit 4ff7af2ea11eb78a945243d234b0a1a6b4624fcc Author: sunjincheng121 <sunjincheng121@gmail.com> Date: 2017-04-06T07:33:50Z FLINK-6257 [table] Optimize test cases
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user asfgit closed the pull request at:

          https://github.com/apache/flink/pull/3681

          Show
          githubbot ASF GitHub Bot added a comment - Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/3681
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user fhueske commented on the issue:

          https://github.com/apache/flink/pull/3681

          Merging

          Show
          githubbot ASF GitHub Bot added a comment - Github user fhueske commented on the issue: https://github.com/apache/flink/pull/3681 Merging
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user fhueske commented on the issue:

          https://github.com/apache/flink/pull/3681

          Thanks for the PR @sunjincheng121.
          +1 to merge

          Show
          githubbot ASF GitHub Bot added a comment - Github user fhueske commented on the issue: https://github.com/apache/flink/pull/3681 Thanks for the PR @sunjincheng121. +1 to merge
          Hide
          sunjincheng121 sunjincheng added a comment -

          Hi Fabian Hueske I have opened the first PR which addressed Functionality and Code style. And the follow-up two PRs will fix Tests and Documentation.

          Thanks,
          SunJincheng

          Show
          sunjincheng121 sunjincheng added a comment - Hi Fabian Hueske I have opened the first PR which addressed Functionality and Code style . And the follow-up two PRs will fix Tests and Documentation . Thanks, SunJincheng
          Hide
          githubbot ASF GitHub Bot added a comment -

          GitHub user sunjincheng121 opened a pull request:

          https://github.com/apache/flink/pull/3681

          FLINK-6257[table] Consistent naming of ProcessFunctions and met…

          …hods, And add ORDER BY checking

          Thanks for contributing to Apache Flink. Before you open your pull request, please take the following check list into consideration.
          If your changes take all of the items into account, feel free to open your pull request. For more information and/or questions please refer to the [How To Contribute guide](http://flink.apache.org/how-to-contribute.html).
          In addition to going through the list, please provide a meaningful description of your changes.

          • [x] General
          • The pull request references the related JIRA issue ("FLINK-6257 Consistent naming of ProcessFunctions and methods, And add ORDER BY checking")
          • The pull request addresses only one issue
          • Each commit in the PR has a meaningful commit message (including the JIRA id)
          • [ ] Documentation
          • Documentation has been added for new functionality
          • Old documentation affected by the pull request has been updated
          • JavaDoc for public methods has been added
          • [ ] Tests & Build
          • Functionality added by the pull request is covered by tests
          • `mvn clean verify` has been executed successfully locally or a Travis build has passed

          You can merge this pull request into a Git repository by running:

          $ git pull https://github.com/sunjincheng121/flink FLINK-6257-PR-CodeStyleAndFunctionality

          Alternatively you can review and apply these changes as the patch at:

          https://github.com/apache/flink/pull/3681.patch

          To close this pull request, make a commit to your master/trunk branch
          with (at least) the following in the commit message:

          This closes #3681


          commit f137ce2777a5b5774c11bebb842e57d2c1c5614f
          Author: sunjincheng121 <sunjincheng121@gmail.com>
          Date: 2017-04-06T02:33:30Z

          FLINK-6257[table] Consistent naming of ProcessFunctions and methods, And add ORDER BY checking


          Show
          githubbot ASF GitHub Bot added a comment - GitHub user sunjincheng121 opened a pull request: https://github.com/apache/flink/pull/3681 FLINK-6257 [table] Consistent naming of ProcessFunctions and met… …hods, And add ORDER BY checking Thanks for contributing to Apache Flink. Before you open your pull request, please take the following check list into consideration. If your changes take all of the items into account, feel free to open your pull request. For more information and/or questions please refer to the [How To Contribute guide] ( http://flink.apache.org/how-to-contribute.html ). In addition to going through the list, please provide a meaningful description of your changes. [x] General The pull request references the related JIRA issue (" FLINK-6257 Consistent naming of ProcessFunctions and methods, And add ORDER BY checking") The pull request addresses only one issue Each commit in the PR has a meaningful commit message (including the JIRA id) [ ] Documentation Documentation has been added for new functionality Old documentation affected by the pull request has been updated JavaDoc for public methods has been added [ ] Tests & Build Functionality added by the pull request is covered by tests `mvn clean verify` has been executed successfully locally or a Travis build has passed You can merge this pull request into a Git repository by running: $ git pull https://github.com/sunjincheng121/flink FLINK-6257 -PR-CodeStyleAndFunctionality Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/3681.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3681 commit f137ce2777a5b5774c11bebb842e57d2c1c5614f Author: sunjincheng121 <sunjincheng121@gmail.com> Date: 2017-04-06T02:33:30Z FLINK-6257 [table] Consistent naming of ProcessFunctions and methods, And add ORDER BY checking
          Hide
          sunjincheng121 sunjincheng added a comment - - edited

          Hi,Fabian Hueske That's fine.
          About Consistent naming of ProcessFunctions and methods, I want consistent them by the follow rule:
          For class name:

          (ProcTime/RowTime) + (Bounded/Unbounded) + [Partitioned/NonPartitioned] +  [Range/Rows] + Over}}
          

          Then we'll get the follow class names:

          • ProcTimeBoundedRangeOver
          • ProcTimeBoundedRowsOver
          • ProcTimeUnboundedPartitionedOver
          • ProcTimeUnboundedNonPartitionedOver
          • RowTimeBoundedRangeOver
          • RowTimeBoundedRowsOver
          • RowTimeUnboundedOver(abstract)
          • RowTimeUnboundedRangeOver
          • RowTimeUnboundedRowsOver

          For methods:

          create + (Bounded/Unbounded) + Over + processFunction
          

          Then we'll get the follow methods names:

          • createUnboundedOverProcessFunction
          • createBoundedOverProcessFunction

          What do you think?
          Best,
          SunJincheng

          Show
          sunjincheng121 sunjincheng added a comment - - edited Hi, Fabian Hueske That's fine. About Consistent naming of ProcessFunctions and methods , I want consistent them by the follow rule: For class name: (ProcTime/RowTime) + (Bounded/Unbounded) + [Partitioned/NonPartitioned] + [Range/Rows] + Over}} Then we'll get the follow class names: ProcTimeBoundedRangeOver ProcTimeBoundedRowsOver ProcTimeUnboundedPartitionedOver ProcTimeUnboundedNonPartitionedOver RowTimeBoundedRangeOver RowTimeBoundedRowsOver RowTimeUnboundedOver(abstract) RowTimeUnboundedRangeOver RowTimeUnboundedRowsOver For methods: create + (Bounded/Unbounded) + Over + processFunction Then we'll get the follow methods names: createUnboundedOverProcessFunction createBoundedOverProcessFunction What do you think? Best, SunJincheng
          Hide
          fhueske Fabian Hueske added a comment -

          Just noticed, this is already a subissue. We cannot create nested subissues. So please assign it to yourself and open a PR with all fixes you want to contribute. We can then do a follow-up PR to fix the remaining issues. Thanks!

          Show
          fhueske Fabian Hueske added a comment - Just noticed, this is already a subissue. We cannot create nested subissues. So please assign it to yourself and open a PR with all fixes you want to contribute. We can then do a follow-up PR to fix the remaining issues. Thanks!
          Hide
          sunjincheng121 sunjincheng added a comment -

          Yes Fabian Hueske I am glad to do it if you can split it into sub-issues and assign it to me.
          Best,
          SunJincheng

          Show
          sunjincheng121 sunjincheng added a comment - Yes Fabian Hueske I am glad to do it if you can split it into sub-issues and assign it to me. Best, SunJincheng
          Hide
          fhueske Fabian Hueske added a comment -

          I'd do the check in the DataStreamOverAggregate. This is how we do many checks and we do not need to touch the Calcite Validator.

          Do you want to work on this issue sunjincheng?
          We can also split it into subissues.

          Show
          fhueske Fabian Hueske added a comment - I'd do the check in the DataStreamOverAggregate . This is how we do many checks and we do not need to touch the Calcite Validator. Do you want to work on this issue sunjincheng ? We can also split it into subissues.
          Hide
          sunjincheng121 sunjincheng added a comment -

          Hi Fabian Hueske I like this proposal very much. We really need to check the field of ORDER BY to ensure that the user is clearly aware of their behavior. And to ensure that the achieve logical of over window achieve is rigorous.
          IMO. In order to check ORDER BY clause. We have two phases can be checked. First one is in planner.validate phase, In this phase will check the legitimacy of the `SqlWindow`,Then we can extend this validation to increase the validation of ORDER BY. But perhaps this is more difficult, because due to calcite visibility issues, sql window some properties FLINK cannot get to. Second one is in DataStreamOverAggregate/DataStreamOverAggregateRule phase, In this phase is very easy to do the checking. Because we can get the TimeType
          inputType.getFieldList.get(overWindow.orderKeys.getFieldCollations.get(0).getFieldIndex).getValue and orderType overWindow.orderKeys.getFieldCollations.get(0).direction. What do you think?

          Show
          sunjincheng121 sunjincheng added a comment - Hi Fabian Hueske I like this proposal very much. We really need to check the field of ORDER BY to ensure that the user is clearly aware of their behavior. And to ensure that the achieve logical of over window achieve is rigorous. IMO. In order to check ORDER BY clause. We have two phases can be checked. First one is in planner.validate phase, In this phase will check the legitimacy of the `SqlWindow`,Then we can extend this validation to increase the validation of ORDER BY. But perhaps this is more difficult, because due to calcite visibility issues, sql window some properties FLINK cannot get to. Second one is in DataStreamOverAggregate/DataStreamOverAggregateRule phase, In this phase is very easy to do the checking. Because we can get the TimeType inputType.getFieldList.get(overWindow.orderKeys.getFieldCollations.get(0).getFieldIndex).getValue and orderType overWindow.orderKeys.getFieldCollations.get(0).direction . What do you think?

            People

            • Assignee:
              sunjincheng121 sunjincheng
              Reporter:
              fhueske Fabian Hueske
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development