Details

    • Type: Sub-task
    • Status: Closed
    • Priority: Major
    • Resolution: Implemented
    • Affects Version/s: 1.3.0
    • Fix Version/s: 1.3.0
    • Component/s: Table API & SQL
    • Labels:
      None

      Description

      Add support for the TUMBLE, HOP, SESSION keywords for batch SQL.

        Issue Links

          Activity

          Hide
          wheat9 Haohui Mai added a comment -

          Fabian Hueske, I am interested in contributing this feature. If you haven't started working on it, do you mind assigning to me? Thanks.

          Show
          wheat9 Haohui Mai added a comment - Fabian Hueske , I am interested in contributing this feature. If you haven't started working on it, do you mind assigning to me? Thanks.
          Hide
          fhueske Fabian Hueske added a comment -

          Hi Haohui Mai, I've already implemented this feature and will open a PR once the support for group window function on streams is merged.

          For some reason I thought you were focusing on streaming SQL and not so much interested in the batch side. That's why I picked the issue myself. Otherwise, I would have left it to you.

          Show
          fhueske Fabian Hueske added a comment - Hi Haohui Mai , I've already implemented this feature and will open a PR once the support for group window function on streams is merged. For some reason I thought you were focusing on streaming SQL and not so much interested in the batch side. That's why I picked the issue myself. Otherwise, I would have left it to you.
          Hide
          wheat9 Haohui Mai added a comment -

          That's even better – looking forward to the PR

          Show
          wheat9 Haohui Mai added a comment - That's even better – looking forward to the PR
          Hide
          githubbot ASF GitHub Bot added a comment -

          GitHub user fhueske opened a pull request:

          https://github.com/apache/flink/pull/3675

          FLINK-6261 [table] Support TUMBLE, HOP, SESSION group window functions for SQL queries on batch tables

          You can merge this pull request into a Git repository by running:

          $ git pull https://github.com/fhueske/flink tableTumbleBatch

          Alternatively you can review and apply these changes as the patch at:

          https://github.com/apache/flink/pull/3675.patch

          To close this pull request, make a commit to your master/trunk branch
          with (at least) the following in the commit message:

          This closes #3675


          commit 876107698aa2d601d5f398ea56fd93eeb03b117a
          Author: Fabian Hueske <fhueske@apache.org>
          Date: 2017-04-04T13:19:25Z

          FLINK-6261 [table] Support TUMBLE, HOP, SESSION group window functions for SQL queries on batch tables.


          Show
          githubbot ASF GitHub Bot added a comment - GitHub user fhueske opened a pull request: https://github.com/apache/flink/pull/3675 FLINK-6261 [table] Support TUMBLE, HOP, SESSION group window functions for SQL queries on batch tables You can merge this pull request into a Git repository by running: $ git pull https://github.com/fhueske/flink tableTumbleBatch Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/3675.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3675 commit 876107698aa2d601d5f398ea56fd93eeb03b117a Author: Fabian Hueske <fhueske@apache.org> Date: 2017-04-04T13:19:25Z FLINK-6261 [table] Support TUMBLE, HOP, SESSION group window functions for SQL queries on batch tables.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user haohui commented on the issue:

          https://github.com/apache/flink/pull/3675

          Thanks for the PR! It looks good to me overall. One problem is that `dataSet.LogicalWindowAggregateRule` has duplicated quite a bit of code in the `dataStream.LogicalWindowAggregateRule`.

          Maybe it makes sense to refactor some of the code into a common class, so that we can follow the same pattern for the implementation of the WindowStart / WindowEnd functions?

          Show
          githubbot ASF GitHub Bot added a comment - Github user haohui commented on the issue: https://github.com/apache/flink/pull/3675 Thanks for the PR! It looks good to me overall. One problem is that `dataSet.LogicalWindowAggregateRule` has duplicated quite a bit of code in the `dataStream.LogicalWindowAggregateRule`. Maybe it makes sense to refactor some of the code into a common class, so that we can follow the same pattern for the implementation of the WindowStart / WindowEnd functions?
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user fhueske commented on the issue:

          https://github.com/apache/flink/pull/3675

          Thanks for the review @haohui!

          You're right about the code duplication. I had started with addressing DataSet and DataStream in one class but ended up with many `if(stream)` conditions.
          I'll refactor that into an abstract class with two subclasses and update the PR soon.

          Show
          githubbot ASF GitHub Bot added a comment - Github user fhueske commented on the issue: https://github.com/apache/flink/pull/3675 Thanks for the review @haohui! You're right about the code duplication. I had started with addressing DataSet and DataStream in one class but ended up with many `if(stream)` conditions. I'll refactor that into an abstract class with two subclasses and update the PR soon.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user fhueske commented on the issue:

          https://github.com/apache/flink/pull/3675

          Hi @haohui, I updated the PR and refactored the rule into an abstract class that is extended by a rule for datastream and another one for dataset.

          *Please note*: I also removed support for `GROUP BY FLOOR/CEIL` on streams. My motivation is the following:

          • it can be expressed with `TUMBLE` (which is even more expressive)
          • We will add support for "non-windowed" aggregations (see PR #3646 by @shaoxuan-wang ) which can include the `GROUP BY FLOOR/CEIL` case. These aggregations will compute the same result but behave differently at runtime. Instead of collecting data and emitting a final result, they will be incrementally computed and send out more updates. I think the behavior of system is more predictable if we only translate Calcite's group window functions into DataStream group windows and all other grouped aggregates into non-windowed aggregates. We target the non-windowed aggregates for the next release.
          • It will make the translation easier. The pattern matching of `FLOOR/CEIL` seems a bit fragile to me. I like Calcite's group window functions more.

          Please let me know what you think about this change and if you have concerns.

          Best, Fabian

          Show
          githubbot ASF GitHub Bot added a comment - Github user fhueske commented on the issue: https://github.com/apache/flink/pull/3675 Hi @haohui, I updated the PR and refactored the rule into an abstract class that is extended by a rule for datastream and another one for dataset. * Please note *: I also removed support for `GROUP BY FLOOR/CEIL` on streams. My motivation is the following: it can be expressed with `TUMBLE` (which is even more expressive) We will add support for "non-windowed" aggregations (see PR #3646 by @shaoxuan-wang ) which can include the `GROUP BY FLOOR/CEIL` case. These aggregations will compute the same result but behave differently at runtime. Instead of collecting data and emitting a final result, they will be incrementally computed and send out more updates. I think the behavior of system is more predictable if we only translate Calcite's group window functions into DataStream group windows and all other grouped aggregates into non-windowed aggregates. We target the non-windowed aggregates for the next release. It will make the translation easier. The pattern matching of `FLOOR/CEIL` seems a bit fragile to me. I like Calcite's group window functions more. Please let me know what you think about this change and if you have concerns. Best, Fabian
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user haohui commented on the issue:

          https://github.com/apache/flink/pull/3675

          The PR looks pretty good. Thanks @fhueske for improving the documentation.

          +1 (non-binding)

          Show
          githubbot ASF GitHub Bot added a comment - Github user haohui commented on the issue: https://github.com/apache/flink/pull/3675 The PR looks pretty good. Thanks @fhueske for improving the documentation. +1 (non-binding)
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user fhueske commented on the issue:

          https://github.com/apache/flink/pull/3675

          Thanks for the review @haohui!

          Just to make sure: You're OK with removing support for windows defined as `GROUP BY CEIL(rowtime() TO HOUR)`?

          Thanks, Fabian

          Show
          githubbot ASF GitHub Bot added a comment - Github user fhueske commented on the issue: https://github.com/apache/flink/pull/3675 Thanks for the review @haohui! Just to make sure: You're OK with removing support for windows defined as `GROUP BY CEIL(rowtime() TO HOUR)`? Thanks, Fabian
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user haohui commented on the issue:

          https://github.com/apache/flink/pull/3675

          Yes. I agree with you that `FLOOR` / `CEIL` are quite fragile and it is much clearer to use the `TUMBLING` / `HOP` / `SESSION` constructs.

          Show
          githubbot ASF GitHub Bot added a comment - Github user haohui commented on the issue: https://github.com/apache/flink/pull/3675 Yes. I agree with you that `FLOOR` / `CEIL` are quite fragile and it is much clearer to use the `TUMBLING` / `HOP` / `SESSION` constructs.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user fhueske commented on the issue:

          https://github.com/apache/flink/pull/3675

          Thanks, will merge this PR

          Show
          githubbot ASF GitHub Bot added a comment - Github user fhueske commented on the issue: https://github.com/apache/flink/pull/3675 Thanks, will merge this PR
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user asfgit closed the pull request at:

          https://github.com/apache/flink/pull/3675

          Show
          githubbot ASF GitHub Bot added a comment - Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/3675
          Hide
          fhueske Fabian Hueske added a comment -

          Implemented with 635394751dce6e532fcd5a758c3d1bdb25303712

          Show
          fhueske Fabian Hueske added a comment - Implemented with 635394751dce6e532fcd5a758c3d1bdb25303712

            People

            • Assignee:
              fhueske Fabian Hueske
              Reporter:
              fhueske Fabian Hueske
            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development