Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-6059

Reject DataSet<Row> and DataStream<Row> without RowTypeInformation

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 1.2.0, 1.3.0
    • Fix Version/s: 1.3.0, 1.2.2
    • Component/s: Table API & SQL
    • Labels:
      None

      Description

      It is not possible to automatically extract proper type information for Row because it is not typed with generics and holds values in an Object[].
      Consequently is handled as GenericType<Row> unless a RowTypeInfo is explicitly specified.

      This can lead to unexpected behavior when converting a DataSet<Row> or DataStream<Row> into a Table. If the data set or data stream has a GenericType<Row>, the rows are treated as atomic type and converted into a single field.

      I think we should reject input types of GenericType<Row> when converting data sets and data streams and request a proper RowTypeInfo.

        Issue Links

          Activity

          Hide
          githubbot ASF GitHub Bot added a comment -

          GitHub user fhueske opened a pull request:

          https://github.com/apache/flink/pull/3546

          FLINK-6059 [table] Reject GenericType<Row> when converting DataSet or DataStream to Table

          When converting a `DataSet<Row>` or `DataStream<Row>` into a `Table` which has `GenericTypeInfo<Row>` type, the row is treated as atomic type. This is always a non-expected and confusing behavior.

          With this change converting a `DataSet<Row>` or `DataStream<Row>` with `GenericTypeInfo<Row>` into a `Table` fails. Instead we ask for a `DataSet<Row>` or `DataStream<Row>` with proper `RowTypeInfo`.

          You can merge this pull request into a Git repository by running:

          $ git pull https://github.com/fhueske/flink tableNoGenericRow

          Alternatively you can review and apply these changes as the patch at:

          https://github.com/apache/flink/pull/3546.patch

          To close this pull request, make a commit to your master/trunk branch
          with (at least) the following in the commit message:

          This closes #3546


          commit 0064a9ed1270d1e139ff9a6b22df5cc95450ce59
          Author: Fabian Hueske <fhueske@apache.org>
          Date: 2017-03-15T12:24:42Z

          FLINK-6059 [table] Reject GenericType<Row> when converting DataSet or DataStream to Table.


          Show
          githubbot ASF GitHub Bot added a comment - GitHub user fhueske opened a pull request: https://github.com/apache/flink/pull/3546 FLINK-6059 [table] Reject GenericType<Row> when converting DataSet or DataStream to Table When converting a `DataSet<Row>` or `DataStream<Row>` into a `Table` which has `GenericTypeInfo<Row>` type, the row is treated as atomic type. This is always a non-expected and confusing behavior. With this change converting a `DataSet<Row>` or `DataStream<Row>` with `GenericTypeInfo<Row>` into a `Table` fails. Instead we ask for a `DataSet<Row>` or `DataStream<Row>` with proper `RowTypeInfo`. You can merge this pull request into a Git repository by running: $ git pull https://github.com/fhueske/flink tableNoGenericRow Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/3546.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3546 commit 0064a9ed1270d1e139ff9a6b22df5cc95450ce59 Author: Fabian Hueske <fhueske@apache.org> Date: 2017-03-15T12:24:42Z FLINK-6059 [table] Reject GenericType<Row> when converting DataSet or DataStream to Table.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user sunjincheng121 commented on a diff in the pull request:

          https://github.com/apache/flink/pull/3546#discussion_r106214186

          — Diff: flink-libraries/flink-table/src/main/scala/org/apache/flink/table/api/TableEnvironment.scala —
          @@ -498,6 +505,10 @@ abstract class TableEnvironment(val config: TableConfig) {
          TableEnvironment.validateType(inputType)
          — End diff –

          Can we do the `GenericTypeInfo` type check in `TableEnvironment.validateType(inputType)` method? If we do so, we can remove the duplicate check. What do you think?
          Best,
          SunJincheng

          Show
          githubbot ASF GitHub Bot added a comment - Github user sunjincheng121 commented on a diff in the pull request: https://github.com/apache/flink/pull/3546#discussion_r106214186 — Diff: flink-libraries/flink-table/src/main/scala/org/apache/flink/table/api/TableEnvironment.scala — @@ -498,6 +505,10 @@ abstract class TableEnvironment(val config: TableConfig) { TableEnvironment.validateType(inputType) — End diff – Can we do the `GenericTypeInfo` type check in `TableEnvironment.validateType(inputType)` method? If we do so, we can remove the duplicate check. What do you think? Best, SunJincheng
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user fhueske commented on a diff in the pull request:

          https://github.com/apache/flink/pull/3546#discussion_r113740954

          — Diff: flink-libraries/flink-table/src/main/scala/org/apache/flink/table/api/TableEnvironment.scala —
          @@ -498,6 +505,10 @@ abstract class TableEnvironment(val config: TableConfig) {
          TableEnvironment.validateType(inputType)
          — End diff –

          The `validateType()` method is also used to check the output type, i.e., the type of a DataSet or DataStream which is created from a Table. I think we should allow `GenericType<Row>` in that case.
          Hence, I would keep the `validateType()` method as it is.

          Show
          githubbot ASF GitHub Bot added a comment - Github user fhueske commented on a diff in the pull request: https://github.com/apache/flink/pull/3546#discussion_r113740954 — Diff: flink-libraries/flink-table/src/main/scala/org/apache/flink/table/api/TableEnvironment.scala — @@ -498,6 +505,10 @@ abstract class TableEnvironment(val config: TableConfig) { TableEnvironment.validateType(inputType) — End diff – The `validateType()` method is also used to check the output type, i.e., the type of a DataSet or DataStream which is created from a Table. I think we should allow `GenericType<Row>` in that case. Hence, I would keep the `validateType()` method as it is.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user fhueske commented on the issue:

          https://github.com/apache/flink/pull/3546

          Thanks for the review @sunjincheng121.
          I will merge this fix tomorrow.

          Cheers, Fabian

          Show
          githubbot ASF GitHub Bot added a comment - Github user fhueske commented on the issue: https://github.com/apache/flink/pull/3546 Thanks for the review @sunjincheng121. I will merge this fix tomorrow. Cheers, Fabian
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user asfgit closed the pull request at:

          https://github.com/apache/flink/pull/3546

          Show
          githubbot ASF GitHub Bot added a comment - Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/3546
          Hide
          fhueske Fabian Hueske added a comment -

          Fixed for 1.3.0 with c8eb55f17d64722bb600c1083a478ab99e53f4ec
          Fixed for 1.2.2 with fdb3f65f2d6595b88edae849ae6c848e5bbfaa2d

          Show
          fhueske Fabian Hueske added a comment - Fixed for 1.3.0 with c8eb55f17d64722bb600c1083a478ab99e53f4ec Fixed for 1.2.2 with fdb3f65f2d6595b88edae849ae6c848e5bbfaa2d

            People

            • Assignee:
              fhueske Fabian Hueske
              Reporter:
              fhueske Fabian Hueske
            • Votes:
              1 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development