Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-5441

Directly allow SQL queries on a Table

    Details

    • Type: New Feature
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.3.0
    • Component/s: Table API & SQL
    • Labels:
      None

      Description

      Right now a user has to register a table before it can be used in SQL queries. In order to allow more fluent programming we propose calling SQL directly on a table. An underscore can be used to reference the current table:

      myTable.sql("SELECT a, b, c FROM _ WHERE d = 12")
      

        Issue Links

          Activity

          Hide
          githubbot ASF GitHub Bot added a comment -

          GitHub user wuchong opened a pull request:

          https://github.com/apache/flink/pull/3107

          FLINK-5441 [table] Directly allow SQL queries on a Table

          Thanks for contributing to Apache Flink. Before you open your pull request, please take the following check list into consideration.
          If your changes take all of the items into account, feel free to open your pull request. For more information and/or questions please refer to the [How To Contribute guide](http://flink.apache.org/how-to-contribute.html).
          In addition to going through the list, please provide a meaningful description of your changes.

          • [ ] General
          • The pull request references the related JIRA issue ("[FLINK-XXX] Jira title text")
          • The pull request addresses only one issue
          • Each commit in the PR has a meaningful commit message (including the JIRA id)
          • [ ] Documentation
          • Documentation has been added for new functionality
          • Old documentation affected by the pull request has been updated
          • JavaDoc for public methods has been added
          • [ ] Tests & Build
          • Functionality added by the pull request is covered by tests
          • `mvn clean verify` has been executed successfully locally or a Travis build has passed

          This PR allows calling SQL directly on a table :

          ```scala
          myTable.sql("SELECT a, b, c FROM _ WHERE d = 12")
          ```

          You can merge this pull request into a Git repository by running:

          $ git pull https://github.com/wuchong/flink sql-FLINK-5441

          Alternatively you can review and apply these changes as the patch at:

          https://github.com/apache/flink/pull/3107.patch

          To close this pull request, make a commit to your master/trunk branch
          with (at least) the following in the commit message:

          This closes #3107


          commit 528d7c2ad61b0247d296d083d3d4942621701cf8
          Author: Jark Wu <wuchong.wc@alibaba-inc.com>
          Date: 2017-01-12T14:26:52Z

          FLINK-5441 [table] Directly allow SQL queries on a Table


          Show
          githubbot ASF GitHub Bot added a comment - GitHub user wuchong opened a pull request: https://github.com/apache/flink/pull/3107 FLINK-5441 [table] Directly allow SQL queries on a Table Thanks for contributing to Apache Flink. Before you open your pull request, please take the following check list into consideration. If your changes take all of the items into account, feel free to open your pull request. For more information and/or questions please refer to the [How To Contribute guide] ( http://flink.apache.org/how-to-contribute.html ). In addition to going through the list, please provide a meaningful description of your changes. [ ] General The pull request references the related JIRA issue (" [FLINK-XXX] Jira title text") The pull request addresses only one issue Each commit in the PR has a meaningful commit message (including the JIRA id) [ ] Documentation Documentation has been added for new functionality Old documentation affected by the pull request has been updated JavaDoc for public methods has been added [ ] Tests & Build Functionality added by the pull request is covered by tests `mvn clean verify` has been executed successfully locally or a Travis build has passed This PR allows calling SQL directly on a table : ```scala myTable.sql("SELECT a, b, c FROM _ WHERE d = 12") ``` You can merge this pull request into a Git repository by running: $ git pull https://github.com/wuchong/flink sql- FLINK-5441 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/3107.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3107 commit 528d7c2ad61b0247d296d083d3d4942621701cf8 Author: Jark Wu <wuchong.wc@alibaba-inc.com> Date: 2017-01-12T14:26:52Z FLINK-5441 [table] Directly allow SQL queries on a Table
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user wuchong commented on the issue:

          https://github.com/apache/flink/pull/3107

          Hi @KurtYoung , thank your for reviewing. I'm not sure about that. Do you mean omitting the "FROM" syntax ? But it will be hard to validate SQL syntax. Because the query can follow a WHERE clause or JOIN/UNION with other tables. For example, it's wired and hard to validate the query like this:

          ```
          mytable.sql("SELECT * WHERE a > 13 UNION SELECT * FROM TABLE_B")
          ```

          I think allowing query on other tables is fine here. What do you think ?

          Show
          githubbot ASF GitHub Bot added a comment - Github user wuchong commented on the issue: https://github.com/apache/flink/pull/3107 Hi @KurtYoung , thank your for reviewing. I'm not sure about that. Do you mean omitting the "FROM" syntax ? But it will be hard to validate SQL syntax. Because the query can follow a WHERE clause or JOIN/UNION with other tables. For example, it's wired and hard to validate the query like this: ``` mytable.sql("SELECT * WHERE a > 13 UNION SELECT * FROM TABLE_B") ``` I think allowing query on other tables is fine here. What do you think ?
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user KurtYoung commented on the issue:

          https://github.com/apache/flink/pull/3107

          I still think this is weird. Currently all API from `Table` is somehow act on the table object, just like a normal OOP program. After introducing `sql` method to `Table`, it can do something completely unrelated to the current object.

          Show
          githubbot ASF GitHub Bot added a comment - Github user KurtYoung commented on the issue: https://github.com/apache/flink/pull/3107 I still think this is weird. Currently all API from `Table` is somehow act on the table object, just like a normal OOP program. After introducing `sql` method to `Table`, it can do something completely unrelated to the current object.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user twalthr commented on the issue:

          https://github.com/apache/flink/pull/3107

          @KurtYoung I agree that this syntax can be misused like `table.sql("SELECT * FROM otherTable")`. But right now it is very inconvenient to always register a table first, especially if you want to define multiple queries subsequently.

          We could also think about `env.sql("SELECT * FROM _ JOIN _", table, otherTable)`. Other suggestions are very welcome.

          Another idea that just came to my mind. We could also use the `toString` method of a Table. This method could implicitly register the Table under a unique name and return that. Then we could support `env.sql(s"SELECT * FROM $table JOIN $otherTable")`.

          Show
          githubbot ASF GitHub Bot added a comment - Github user twalthr commented on the issue: https://github.com/apache/flink/pull/3107 @KurtYoung I agree that this syntax can be misused like `table.sql("SELECT * FROM otherTable")`. But right now it is very inconvenient to always register a table first, especially if you want to define multiple queries subsequently. We could also think about `env.sql("SELECT * FROM _ JOIN _", table, otherTable)`. Other suggestions are very welcome. Another idea that just came to my mind. We could also use the `toString` method of a Table. This method could implicitly register the Table under a unique name and return that. Then we could support `env.sql(s"SELECT * FROM $table JOIN $otherTable")`.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user wuchong commented on the issue:

          https://github.com/apache/flink/pull/3107

          What about using slf4j syntax `env.sql("SELECT * FROM {} JOIN {}", table1, table2)` ?

          Show
          githubbot ASF GitHub Bot added a comment - Github user wuchong commented on the issue: https://github.com/apache/flink/pull/3107 What about using slf4j syntax `env.sql("SELECT * FROM {} JOIN {}", table1, table2)` ?
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user fhueske commented on the issue:

          https://github.com/apache/flink/pull/3107

          Hi, I like @twalthr's suggestion to use `env.sql("SELECT * FROM _ JOIN _", table, otherTable)`.
          I'd also be fine with slf4j syntax as @wuchong proposed.

          What do you think @KurtYoung?

          Show
          githubbot ASF GitHub Bot added a comment - Github user fhueske commented on the issue: https://github.com/apache/flink/pull/3107 Hi, I like @twalthr's suggestion to use `env.sql("SELECT * FROM _ JOIN _", table, otherTable)`. I'd also be fine with slf4j syntax as @wuchong proposed. What do you think @KurtYoung?
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user KurtYoung commented on the issue:

          https://github.com/apache/flink/pull/3107

          Hi @fhueske, both solutions are fine with me.

          Show
          githubbot ASF GitHub Bot added a comment - Github user KurtYoung commented on the issue: https://github.com/apache/flink/pull/3107 Hi @fhueske, both solutions are fine with me.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user twalthr commented on the issue:

          https://github.com/apache/flink/pull/3107

          What do you think about my solution of `env.sql(s"SELECT * FROM $table JOIN $otherTable")`? It would be much more readable.

          Show
          githubbot ASF GitHub Bot added a comment - Github user twalthr commented on the issue: https://github.com/apache/flink/pull/3107 What do you think about my solution of `env.sql(s"SELECT * FROM $table JOIN $otherTable")`? It would be much more readable.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user wuchong commented on the issue:

          https://github.com/apache/flink/pull/3107

          Hi @twalthr , the `env.sql(s"SELECT * FROM $table JOIN $otherTable")` is a nice way. But how to handle the table's table name and register it to env ?

          Show
          githubbot ASF GitHub Bot added a comment - Github user wuchong commented on the issue: https://github.com/apache/flink/pull/3107 Hi @twalthr , the `env.sql(s"SELECT * FROM $table JOIN $otherTable")` is a nice way. But how to handle the table's table name and register it to env ?
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user fhueske commented on the issue:

          https://github.com/apache/flink/pull/3107

          Agreed, for Scala the inlined variant looks really good. Like @wuchong,
          I'm curious how we can intercept the inlined string arguments. Do you know how to do that @twalthr?

          Show
          githubbot ASF GitHub Bot added a comment - Github user fhueske commented on the issue: https://github.com/apache/flink/pull/3107 Agreed, for Scala the inlined variant looks really good. Like @wuchong, I'm curious how we can intercept the inlined string arguments. Do you know how to do that @twalthr?
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user twalthr commented on the issue:

          https://github.com/apache/flink/pull/3107

          We don't have to intercept inline string arguments. We just have to make sure that `Table.toString()` returns an unique identifier that the method previously registered in it's table environment. `Table.toString()` should then always return the same identifier.

          Show
          githubbot ASF GitHub Bot added a comment - Github user twalthr commented on the issue: https://github.com/apache/flink/pull/3107 We don't have to intercept inline string arguments. We just have to make sure that `Table.toString()` returns an unique identifier that the method previously registered in it's table environment. `Table.toString()` should then always return the same identifier.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user wuchong commented on the issue:

          https://github.com/apache/flink/pull/3107

          @twalthr do you mean register table in his `toString` method ?

          Show
          githubbot ASF GitHub Bot added a comment - Github user wuchong commented on the issue: https://github.com/apache/flink/pull/3107 @twalthr do you mean register table in his `toString` method ?
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user fpompermaier commented on the issue:

          https://github.com/apache/flink/pull/3107

          It's for this reason that I think that Flink needs a "source catalog"
          somewhere..

          On 22 February 2017 at 09:35, twalthr <notifications@github.com> wrote:

          > We don't have to intercept inline string arguments. We just have to make
          > sure that Table.toString() returns an unique identifier that the method
          > previously registered in it's table environment. Table.toString() should
          > then always return the same identifier.
          >
          > —
          > You are receiving this because you are subscribed to this thread.
          > Reply to this email directly, view it on GitHub
          > <https://github.com/apache/flink/pull/3107#issuecomment-281603488>, or mute
          > the thread
          > <https://github.com/notifications/unsubscribe-auth/ADmeJYvF3DJZFWQ2oYcR_mUgC-2WFll8ks5re_NdgaJpZM4Lhz5Y>
          > .
          >

          Show
          githubbot ASF GitHub Bot added a comment - Github user fpompermaier commented on the issue: https://github.com/apache/flink/pull/3107 It's for this reason that I think that Flink needs a "source catalog" somewhere.. On 22 February 2017 at 09:35, twalthr <notifications@github.com> wrote: > We don't have to intercept inline string arguments. We just have to make > sure that Table.toString() returns an unique identifier that the method > previously registered in it's table environment. Table.toString() should > then always return the same identifier. > > — > You are receiving this because you are subscribed to this thread. > Reply to this email directly, view it on GitHub > < https://github.com/apache/flink/pull/3107#issuecomment-281603488 >, or mute > the thread > < https://github.com/notifications/unsubscribe-auth/ADmeJYvF3DJZFWQ2oYcR_mUgC-2WFll8ks5re_NdgaJpZM4Lhz5Y > > . >
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user fhueske commented on the issue:

          https://github.com/apache/flink/pull/3107

          An external catalog would not help in this case. It's actually quite the opposite. The issue is to run a SQL query on an ad-hoc table without the need to explicitly add it to a catalog.

          Show
          githubbot ASF GitHub Bot added a comment - Github user fhueske commented on the issue: https://github.com/apache/flink/pull/3107 An external catalog would not help in this case. It's actually quite the opposite. The issue is to run a SQL query on an ad-hoc table without the need to explicitly add it to a catalog.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user twalthr commented on the issue:

          https://github.com/apache/flink/pull/3107

          @wuchong if `toString` is called for the first time. It calls `tEnv.registerTableInternal(this)` which uses an atomic counter and generates a unique identifier such as `UnnamedTable$232`. This identifier is returned and saved in the table. Every further call of toString return this identifier.

          Show
          githubbot ASF GitHub Bot added a comment - Github user twalthr commented on the issue: https://github.com/apache/flink/pull/3107 @wuchong if `toString` is called for the first time. It calls `tEnv.registerTableInternal(this)` which uses an atomic counter and generates a unique identifier such as `UnnamedTable$232`. This identifier is returned and saved in the table. Every further call of toString return this identifier.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user wuchong commented on the issue:

          https://github.com/apache/flink/pull/3107

          Thank you @twalthr , I like the idea. So Java users can also use `env.sql("SELECT * FROM " + table1 + " JOIN " + table2)` as a shortcut. What do you think @fhueske ?

          Show
          githubbot ASF GitHub Bot added a comment - Github user wuchong commented on the issue: https://github.com/apache/flink/pull/3107 Thank you @twalthr , I like the idea. So Java users can also use `env.sql("SELECT * FROM " + table1 + " JOIN " + table2)` as a shortcut. What do you think @fhueske ?
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user fhueske commented on the issue:

          https://github.com/apache/flink/pull/3107

          Hmm, I think registering tables in `toString()` (which IMO should be free of side effects) is not very clean, but I have to admit that the API looks good. So, I would be OK doing this like that.

          Would we also take care to unregister the tables after the query was optimized?

          Show
          githubbot ASF GitHub Bot added a comment - Github user fhueske commented on the issue: https://github.com/apache/flink/pull/3107 Hmm, I think registering tables in `toString()` (which IMO should be free of side effects) is not very clean, but I have to admit that the API looks good. So, I would be OK doing this like that. Would we also take care to unregister the tables after the query was optimized?
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user twalthr commented on the issue:

          https://github.com/apache/flink/pull/3107

          Do we have to take care of unregistering the tables? Right now `fromDataSet/fromDataStream` do also not unregister their unique names.

          Show
          githubbot ASF GitHub Bot added a comment - Github user twalthr commented on the issue: https://github.com/apache/flink/pull/3107 Do we have to take care of unregistering the tables? Right now `fromDataSet/fromDataStream` do also not unregister their unique names.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user fhueske commented on the issue:

          https://github.com/apache/flink/pull/3107

          But what's the point of keeping tables that will never be used again?

          Show
          githubbot ASF GitHub Bot added a comment - Github user fhueske commented on the issue: https://github.com/apache/flink/pull/3107 But what's the point of keeping tables that will never be used again?
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user twalthr commented on the issue:

          https://github.com/apache/flink/pull/3107

          I think the point is that we don't know if the table will ever be used again. If we unregister them after optimization, we can not have multiple `toDataStream()` calls for one table.

          Show
          githubbot ASF GitHub Bot added a comment - Github user twalthr commented on the issue: https://github.com/apache/flink/pull/3107 I think the point is that we don't know if the table will ever be used again. If we unregister them after optimization, we can not have multiple `toDataStream()` calls for one table.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user fhueske commented on the issue:

          https://github.com/apache/flink/pull/3107

          Yes, you are right. Forgot that we lazily translate the plans. :-/
          Let's keep the tables then.

          Show
          githubbot ASF GitHub Bot added a comment - Github user fhueske commented on the issue: https://github.com/apache/flink/pull/3107 Yes, you are right. Forgot that we lazily translate the plans. :-/ Let's keep the tables then.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user twalthr commented on the issue:

          https://github.com/apache/flink/pull/3107

          @wuchong any plans to update this PR?

          Show
          githubbot ASF GitHub Bot added a comment - Github user twalthr commented on the issue: https://github.com/apache/flink/pull/3107 @wuchong any plans to update this PR?
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user wuchong commented on the issue:

          https://github.com/apache/flink/pull/3107

          @twalthr I have updated this PR regarding to the idea you proposed that `env.sql(s"SELECT * FROM $table JOIN $otherTable")`

          Show
          githubbot ASF GitHub Bot added a comment - Github user wuchong commented on the issue: https://github.com/apache/flink/pull/3107 @twalthr I have updated this PR regarding to the idea you proposed that `env.sql(s"SELECT * FROM $table JOIN $otherTable")`
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user twalthr commented on the issue:

          https://github.com/apache/flink/pull/3107

          Thanks for the quick update @wuchong. The code looks good to merge. Can you add some documentation to the SQL section?

          Show
          githubbot ASF GitHub Bot added a comment - Github user twalthr commented on the issue: https://github.com/apache/flink/pull/3107 Thanks for the quick update @wuchong. The code looks good to merge. Can you add some documentation to the SQL section?
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user wuchong commented on the issue:

          https://github.com/apache/flink/pull/3107

          updated the documentation.

          Show
          githubbot ASF GitHub Bot added a comment - Github user wuchong commented on the issue: https://github.com/apache/flink/pull/3107 updated the documentation.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user twalthr commented on the issue:

          https://github.com/apache/flink/pull/3107

          Thanks @wuchong. Will merge this.

          Show
          githubbot ASF GitHub Bot added a comment - Github user twalthr commented on the issue: https://github.com/apache/flink/pull/3107 Thanks @wuchong. Will merge this.
          Hide
          twalthr Timo Walther added a comment -

          Fixed in 1.3.0: f333723bcdaaa1cdc99dce16b00151d1c5365869

          Show
          twalthr Timo Walther added a comment - Fixed in 1.3.0: f333723bcdaaa1cdc99dce16b00151d1c5365869
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user asfgit closed the pull request at:

          https://github.com/apache/flink/pull/3107

          Show
          githubbot ASF GitHub Bot added a comment - Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/3107

            People

            • Assignee:
              jark Jark Wu
              Reporter:
              twalthr Timo Walther
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development