Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-5358

Support RowTypeInfo extraction in TypeExtractor by Row instance

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.3.0
    • Component/s: None
    • Labels:
      None

      Description

      Row[] data = new Row[]{};
      TypeInformation<X> typeInfo = TypeExtractor.getForObject(data[0]);
      

      method getForObject wraps it into

      GenericTypeInfo<org.apache.flink.types.Row>
      

      the method should return RowTypeInfo

        Issue Links

          Activity

          Hide
          fhueske Fabian Hueske added a comment -

          Implemented for 1.3 with 2af939a10288348eedb56fc0959daee77c9cdcf3

          Show
          fhueske Fabian Hueske added a comment - Implemented for 1.3 with 2af939a10288348eedb56fc0959daee77c9cdcf3
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user asfgit closed the pull request at:

          https://github.com/apache/flink/pull/3027

          Show
          githubbot ASF GitHub Bot added a comment - Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/3027
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user fhueske commented on the issue:

          https://github.com/apache/flink/pull/3027

          merging

          Show
          githubbot ASF GitHub Bot added a comment - Github user fhueske commented on the issue: https://github.com/apache/flink/pull/3027 merging
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user twalthr commented on the issue:

          https://github.com/apache/flink/pull/3027

          +1 to merge

          Show
          githubbot ASF GitHub Bot added a comment - Github user twalthr commented on the issue: https://github.com/apache/flink/pull/3027 +1 to merge
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user fhueske commented on the issue:

          https://github.com/apache/flink/pull/3027

          Thanks @tonycox. PR is good to merge.

          Show
          githubbot ASF GitHub Bot added a comment - Github user fhueske commented on the issue: https://github.com/apache/flink/pull/3027 Thanks @tonycox. PR is good to merge.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user fhueske commented on a diff in the pull request:

          https://github.com/apache/flink/pull/3027#discussion_r93937117

          — Diff: flink-core/src/test/java/org/apache/flink/api/java/typeutils/TypeExtractorTest.java —
          @@ -345,8 +346,25 @@ public CustomType cross(CustomType first, Integer second) throws Exception

          { Assert.assertFalse(TypeExtractor.getForClass(PojoWithNonPublicDefaultCtor.class) instanceof PojoTypeInfo); }

          + @Test
          + public void testRow() {
          — End diff –

          Oh, I'm sorry. I overlooked that check.

          Show
          githubbot ASF GitHub Bot added a comment - Github user fhueske commented on a diff in the pull request: https://github.com/apache/flink/pull/3027#discussion_r93937117 — Diff: flink-core/src/test/java/org/apache/flink/api/java/typeutils/TypeExtractorTest.java — @@ -345,8 +346,25 @@ public CustomType cross(CustomType first, Integer second) throws Exception { Assert.assertFalse(TypeExtractor.getForClass(PojoWithNonPublicDefaultCtor.class) instanceof PojoTypeInfo); } + @Test + public void testRow() { — End diff – Oh, I'm sorry. I overlooked that check.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user tonycox commented on a diff in the pull request:

          https://github.com/apache/flink/pull/3027#discussion_r93848593

          — Diff: flink-core/src/test/java/org/apache/flink/api/java/typeutils/TypeExtractorTest.java —
          @@ -345,8 +346,25 @@ public CustomType cross(CustomType first, Integer second) throws Exception

          { Assert.assertFalse(TypeExtractor.getForClass(PojoWithNonPublicDefaultCtor.class) instanceof PojoTypeInfo); }

          + @Test
          + public void testRow() {
          — End diff –

          in different test method?
          there is already Row with a null field check

          `Row nullRow = new Row(2);
          TypeInformation<Row> genericRowInfo = TypeExtractor.getForObject(nullRow);
          Assert.assertEquals(genericRowInfo, new GenericTypeInfo<>(Row.class));`

          Show
          githubbot ASF GitHub Bot added a comment - Github user tonycox commented on a diff in the pull request: https://github.com/apache/flink/pull/3027#discussion_r93848593 — Diff: flink-core/src/test/java/org/apache/flink/api/java/typeutils/TypeExtractorTest.java — @@ -345,8 +346,25 @@ public CustomType cross(CustomType first, Integer second) throws Exception { Assert.assertFalse(TypeExtractor.getForClass(PojoWithNonPublicDefaultCtor.class) instanceof PojoTypeInfo); } + @Test + public void testRow() { — End diff – in different test method? there is already Row with a null field check `Row nullRow = new Row(2); TypeInformation<Row> genericRowInfo = TypeExtractor.getForObject(nullRow); Assert.assertEquals(genericRowInfo, new GenericTypeInfo<>(Row.class));`
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user fhueske commented on the issue:

          https://github.com/apache/flink/pull/3027

          Thanks @tonycox. The changes look good. I'd would add one more test case to cover the case of `Row` objects with `null` fields.

          Show
          githubbot ASF GitHub Bot added a comment - Github user fhueske commented on the issue: https://github.com/apache/flink/pull/3027 Thanks @tonycox. The changes look good. I'd would add one more test case to cover the case of `Row` objects with `null` fields.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user fhueske commented on a diff in the pull request:

          https://github.com/apache/flink/pull/3027#discussion_r93805887

          — Diff: flink-core/src/test/java/org/apache/flink/api/java/typeutils/TypeExtractorTest.java —
          @@ -345,8 +346,25 @@ public CustomType cross(CustomType first, Integer second) throws Exception

          { Assert.assertFalse(TypeExtractor.getForClass(PojoWithNonPublicDefaultCtor.class) instanceof PojoTypeInfo); }

          + @Test
          + public void testRow() {
          — End diff –

          Please add a check for a `Row` with a `null` field.

          Show
          githubbot ASF GitHub Bot added a comment - Github user fhueske commented on a diff in the pull request: https://github.com/apache/flink/pull/3027#discussion_r93805887 — Diff: flink-core/src/test/java/org/apache/flink/api/java/typeutils/TypeExtractorTest.java — @@ -345,8 +346,25 @@ public CustomType cross(CustomType first, Integer second) throws Exception { Assert.assertFalse(TypeExtractor.getForClass(PojoWithNonPublicDefaultCtor.class) instanceof PojoTypeInfo); } + @Test + public void testRow() { — End diff – Please add a check for a `Row` with a `null` field.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user tonycox commented on the issue:

          https://github.com/apache/flink/pull/3027

          Hi @fhueske could you look at this PR?

          Show
          githubbot ASF GitHub Bot added a comment - Github user tonycox commented on the issue: https://github.com/apache/flink/pull/3027 Hi @fhueske could you look at this PR?
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user wuchong commented on a diff in the pull request:

          https://github.com/apache/flink/pull/3027#discussion_r93159264

          — Diff: flink-core/src/test/java/org/apache/flink/api/java/typeutils/TypeExtractorTest.java —
          @@ -345,8 +346,22 @@ public CustomType cross(CustomType first, Integer second) throws Exception

          { Assert.assertFalse(TypeExtractor.getForClass(PojoWithNonPublicDefaultCtor.class) instanceof PojoTypeInfo); }

          + @Test
          + public void testRow() {
          + Row row = new Row(2);
          + row.setField(0, "string");
          + row.setField(1, 15);
          + TypeInformation<Row> rowInfo = TypeExtractor.getForObject(row);
          + Assert.assertEquals(rowInfo.getClass(), RowTypeInfo.class);
          + Assert.assertEquals(2, rowInfo.getArity());
          +
          + Row nullRow = new Row(2);
          + TypeInformation<Row> genericRowInfo = TypeExtractor.getForObject(nullRow);
          + Assert.assertEquals(genericRowInfo, new GenericTypeInfo<>(Row.class));
          + int arity = genericRowInfo.getArity();
          + genericRowInfo.hashCode();
          — End diff –

          Why do we call `getArity()` and `hashCode()` here ?

          Show
          githubbot ASF GitHub Bot added a comment - Github user wuchong commented on a diff in the pull request: https://github.com/apache/flink/pull/3027#discussion_r93159264 — Diff: flink-core/src/test/java/org/apache/flink/api/java/typeutils/TypeExtractorTest.java — @@ -345,8 +346,22 @@ public CustomType cross(CustomType first, Integer second) throws Exception { Assert.assertFalse(TypeExtractor.getForClass(PojoWithNonPublicDefaultCtor.class) instanceof PojoTypeInfo); } + @Test + public void testRow() { + Row row = new Row(2); + row.setField(0, "string"); + row.setField(1, 15); + TypeInformation<Row> rowInfo = TypeExtractor.getForObject(row); + Assert.assertEquals(rowInfo.getClass(), RowTypeInfo.class); + Assert.assertEquals(2, rowInfo.getArity()); + + Row nullRow = new Row(2); + TypeInformation<Row> genericRowInfo = TypeExtractor.getForObject(nullRow); + Assert.assertEquals(genericRowInfo, new GenericTypeInfo<>(Row.class)); + int arity = genericRowInfo.getArity(); + genericRowInfo.hashCode(); — End diff – Why do we call `getArity()` and `hashCode()` here ?
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user wuchong commented on a diff in the pull request:

          https://github.com/apache/flink/pull/3027#discussion_r93159119

          — Diff: flink-core/src/test/java/org/apache/flink/api/java/typeutils/TypeExtractorTest.java —
          @@ -345,8 +346,22 @@ public CustomType cross(CustomType first, Integer second) throws Exception

          { Assert.assertFalse(TypeExtractor.getForClass(PojoWithNonPublicDefaultCtor.class) instanceof PojoTypeInfo); }

          + @Test
          + public void testRow() {
          + Row row = new Row(2);
          + row.setField(0, "string");
          + row.setField(1, 15);
          + TypeInformation<Row> rowInfo = TypeExtractor.getForObject(row);
          + Assert.assertEquals(rowInfo.getClass(), RowTypeInfo.class);
          + Assert.assertEquals(2, rowInfo.getArity());
          — End diff –

          Can we also test the nested type info? such as:
          ```java
          assertEquals(
          new RowTypeInfo(
          BasicTypeInfo.STRING_TYPE_INFO,
          BasicTypeInfo.INT_TYPE_INFO),
          rowInfo);
          ```

          Show
          githubbot ASF GitHub Bot added a comment - Github user wuchong commented on a diff in the pull request: https://github.com/apache/flink/pull/3027#discussion_r93159119 — Diff: flink-core/src/test/java/org/apache/flink/api/java/typeutils/TypeExtractorTest.java — @@ -345,8 +346,22 @@ public CustomType cross(CustomType first, Integer second) throws Exception { Assert.assertFalse(TypeExtractor.getForClass(PojoWithNonPublicDefaultCtor.class) instanceof PojoTypeInfo); } + @Test + public void testRow() { + Row row = new Row(2); + row.setField(0, "string"); + row.setField(1, 15); + TypeInformation<Row> rowInfo = TypeExtractor.getForObject(row); + Assert.assertEquals(rowInfo.getClass(), RowTypeInfo.class); + Assert.assertEquals(2, rowInfo.getArity()); — End diff – Can we also test the nested type info? such as: ```java assertEquals( new RowTypeInfo( BasicTypeInfo.STRING_TYPE_INFO, BasicTypeInfo.INT_TYPE_INFO), rowInfo); ```
          Hide
          githubbot ASF GitHub Bot added a comment -

          GitHub user tonycox opened a pull request:

          https://github.com/apache/flink/pull/3027

          FLINK-5358 add RowTypeInfo exctraction in TypeExtractor

          Thanks for contributing to Apache Flink. Before you open your pull request, please take the following check list into consideration.
          If your changes take all of the items into account, feel free to open your pull request. For more information and/or questions please refer to the [How To Contribute guide](http://flink.apache.org/how-to-contribute.html).
          In addition to going through the list, please provide a meaningful description of your changes.

          • [ ] General
          • The pull request references the related JIRA issue ("[FLINK-XXX] Jira title text")
          • The pull request addresses only one issue
          • Each commit in the PR has a meaningful commit message (including the JIRA id)
          • [ ] Documentation
          • Documentation has been added for new functionality
          • Old documentation affected by the pull request has been updated
          • JavaDoc for public methods has been added
          • [ ] Tests & Build
          • Functionality added by the pull request is covered by tests
          • `mvn clean verify` has been executed successfully locally or a Travis build has passed

          You can merge this pull request into a Git repository by running:

          $ git pull https://github.com/tonycox/flink rowTypeExtractor

          Alternatively you can review and apply these changes as the patch at:

          https://github.com/apache/flink/pull/3027.patch

          To close this pull request, make a commit to your master/trunk branch
          with (at least) the following in the commit message:

          This closes #3027


          commit 7aa9992e681761e255f3a24492f202354e77ab2e
          Author: tonycox <anton_solovev@epam.com>
          Date: 2016-12-16T16:55:40Z

          FLINK-5358 add RowTypeInfo exctraction in TypeExtractor


          Show
          githubbot ASF GitHub Bot added a comment - GitHub user tonycox opened a pull request: https://github.com/apache/flink/pull/3027 FLINK-5358 add RowTypeInfo exctraction in TypeExtractor Thanks for contributing to Apache Flink. Before you open your pull request, please take the following check list into consideration. If your changes take all of the items into account, feel free to open your pull request. For more information and/or questions please refer to the [How To Contribute guide] ( http://flink.apache.org/how-to-contribute.html ). In addition to going through the list, please provide a meaningful description of your changes. [ ] General The pull request references the related JIRA issue (" [FLINK-XXX] Jira title text") The pull request addresses only one issue Each commit in the PR has a meaningful commit message (including the JIRA id) [ ] Documentation Documentation has been added for new functionality Old documentation affected by the pull request has been updated JavaDoc for public methods has been added [ ] Tests & Build Functionality added by the pull request is covered by tests `mvn clean verify` has been executed successfully locally or a Travis build has passed You can merge this pull request into a Git repository by running: $ git pull https://github.com/tonycox/flink rowTypeExtractor Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/3027.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3027 commit 7aa9992e681761e255f3a24492f202354e77ab2e Author: tonycox <anton_solovev@epam.com> Date: 2016-12-16T16:55:40Z FLINK-5358 add RowTypeInfo exctraction in TypeExtractor
          Hide
          fhueske Fabian Hueske added a comment -

          Very good point Jark Wu, in that case the type extraction would not work. We could only default to GenericType which gives different behavior that RowType because it is not a CompositeType and log a warning to explicitly define the RowTypeInfo.

          Show
          fhueske Fabian Hueske added a comment - Very good point Jark Wu , in that case the type extraction would not work. We could only default to GenericType which gives different behavior that RowType because it is not a CompositeType and log a warning to explicitly define the RowTypeInfo .
          Hide
          jark Jark Wu added a comment -

          If one of the fields is null, what is the TypeInformation of this field?

          Show
          jark Jark Wu added a comment - If one of the fields is null, what is the TypeInformation of this field?
          Hide
          fhueske Fabian Hueske added a comment -

          Hi Anton Solovev,

          I think we need to make the JIRA title a bit more concrete.
          Extracting the correct RowTypeInfo using the TypeExtractor does only work if there is a concrete Row instance. It does not work to extract the type from a function signature such as FlatMap<Row, Row>.

          In order to extract the TypeInformation for a concrete Row instance, the type extractor would need to extract the type of all objects hold in the fields array and instantiate a RowTypeInformation with these types. If all Row s in a collection have the same types for all fields, the approach to extract the type from the first object should be OK. The same assumption is done for every other type.

          Show
          fhueske Fabian Hueske added a comment - Hi Anton Solovev , I think we need to make the JIRA title a bit more concrete. Extracting the correct RowTypeInfo using the TypeExtractor does only work if there is a concrete Row instance. It does not work to extract the type from a function signature such as FlatMap<Row, Row> . In order to extract the TypeInformation for a concrete Row instance, the type extractor would need to extract the type of all objects hold in the fields array and instantiate a RowTypeInformation with these types. If all Row s in a collection have the same types for all fields, the approach to extract the type from the first object should be OK. The same assumption is done for every other type.
          Hide
          tonycox Anton Solovev added a comment - - edited

          Any ExecutionEnvironment exctracts type from TypeExtractor by first element in collection in such methods #fromCollection(Collection<X> data), #fromElements(X... data)
          I think we need rework them either or just don't use them with Row to generate DataSet/DataStream
          Fabian Hueske, Jark Wu what do you think?

          Show
          tonycox Anton Solovev added a comment - - edited Any ExecutionEnvironment exctracts type from TypeExtractor by first element in collection in such methods #fromCollection(Collection<X> data) , #fromElements(X... data) I think we need rework them either or just don't use them with Row to generate DataSet / DataStream Fabian Hueske , Jark Wu what do you think?

            People

            • Assignee:
              tonycox Anton Solovev
              Reporter:
              tonycox Anton Solovev
            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development