Uploaded image for project: 'Calcite'
  1. Calcite
  2. CALCITE-1297

RelBuilder should rename fields without creating an identity Project

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 1.8.0
    • Fix Version/s: 1.9.0
    • Component/s: core
    • Labels:
      None

      Description

      RelBuilder does not, and should not, translate identity projects, even if they rename fields. But when building further relational expressions (e.g. a project) the fields should appear to have their new name.

      This requires some tricky implementation. In the RelBuilder, the top RelNode on the stack may have a particular row type (say fields (x, y, z)) yet if will have a frame that indicates that the field names from the user's perspective are (a, b, c).

        Activity

        Hide
        julianhyde Julian Hyde added a comment - - edited

        Original description:

        RelBuilder does not translate identity projects even if they rename fields

        I don't understand. Can you elaborate?

        Show
        julianhyde Julian Hyde added a comment - - edited Original description: RelBuilder does not translate identity projects even if they rename fields I don't understand. Can you elaborate?
        Hide
        jark Jark Wu added a comment -

        Say we have a Table(f0, f1, f2) , and we want to make aliases for every field using RelBuilder :

        relBuilder.project(
              Lists.newArrayList(
                relBuilder.alias(relBuilder.field("f0"), "a"),
                relBuilder.alias(relBuilder.field("f1"), "b"),
                relBuilder.alias(relBuilder.field("f2"), "c")),
              Lists.newArrayList("a", "b", "c")
            );
        

        But it will not change anything, we do not get the new project.

        And I create a PR for fix this : https://github.com/apache/calcite/pull/251

        Show
        jark Jark Wu added a comment - Say we have a Table(f0, f1, f2) , and we want to make aliases for every field using RelBuilder : relBuilder.project( Lists.newArrayList( relBuilder.alias(relBuilder.field( "f0" ), "a" ), relBuilder.alias(relBuilder.field( "f1" ), "b" ), relBuilder.alias(relBuilder.field( "f2" ), "c" )), Lists.newArrayList( "a" , "b" , "c" ) ); But it will not change anything, we do not get the new project. And I create a PR for fix this : https://github.com/apache/calcite/pull/251
        Hide
        julianhyde Julian Hyde added a comment -

        This is intentional. RelBuilder optimizes as it builds expressions. So, it removes trivial projects. It also optimizes filters, and removes trivial filters.

        RelBuilder will remember the field names, so you can refer to fields a, b, c when you build the next expression.

        Did you run the tests against your PR? I think it will break stuff.

        Show
        julianhyde Julian Hyde added a comment - This is intentional. RelBuilder optimizes as it builds expressions. So, it removes trivial projects. It also optimizes filters, and removes trivial filters. RelBuilder will remember the field names, so you can refer to fields a, b, c when you build the next expression. Did you run the tests against your PR? I think it will break stuff.
        Hide
        jark Jark Wu added a comment - - edited

        However, when I build next expression, such as relBuilder.field("a"), it throws

        field [a] not found; input fields are: [f0, f1, f2]
        
        Show
        jark Jark Wu added a comment - - edited However, when I build next expression, such as relBuilder.field("a") , it throws field [a] not found; input fields are: [f0, f1, f2]
        Hide
        julianhyde Julian Hyde added a comment -

        Ah, that's wrong. RelBuilder.field must use the field names just created (even if that thing was optimized away).

        To implement, field(int, int, String) would need to use stack.peek() rather than input.getRowType(). field(String, String) already does the right thing.

        Show
        julianhyde Julian Hyde added a comment - Ah, that's wrong. RelBuilder.field must use the field names just created (even if that thing was optimized away). To implement, field(int, int, String) would need to use stack.peek() rather than input.getRowType() . field(String, String) already does the right thing.
        Hide
        jark Jark Wu added a comment -

        I'm sorry to disturb you again. I do not get what you mean...

        I tried to use RelBuilder.field("a", "a") or RelBuilder.field("a", "f0"), but it throws no relation with alias 'a'; aliases are: [_DataSetTable_0]. And I debug into field(String, String) and find that the stack.peek returns a LogicalTableScan (the stack size is 1), not a Project.

        My question is how can I refer to the field using the alias name, after using an AS ?

        I have been confused that why these code exist L788~L799 [1] ? It seems duplicate with the following codes.

        [1] https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/tools/RelBuilder.java#L788-L790

        Show
        jark Jark Wu added a comment - I'm sorry to disturb you again. I do not get what you mean... I tried to use RelBuilder.field("a", "a") or RelBuilder.field("a", "f0"), but it throws no relation with alias 'a'; aliases are: [_DataSetTable_0] . And I debug into field(String, String) and find that the stack.peek returns a LogicalTableScan (the stack size is 1), not a Project. My question is how can I refer to the field using the alias name, after using an AS ? I have been confused that why these code exist L788~L799 [1] ? It seems duplicate with the following codes. [1] https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/tools/RelBuilder.java#L788-L790
        Hide
        julianhyde Julian Hyde added a comment -

        You're right. I didn't mean to duplicate the code. I think I introduced it while merging CALCITE-816.

        In https://github.com/apache/calcite/commit/b5b28f0b2d263a2e2fb6894d67c8666549f4d053, I should have added a test case to RelBuilderTest, and the merge error would have been spotted.

        The changes I was talking about making to field(int, int, String) would go further. We'd never create an identity project, even if the fields were different. We'd get the "virtual" row type from the Frame data structure, not call RelNode.getRowType(), and the virtual row type would have the field names that were just projected by the user.

        Show
        julianhyde Julian Hyde added a comment - You're right. I didn't mean to duplicate the code. I think I introduced it while merging CALCITE-816 . In https://github.com/apache/calcite/commit/b5b28f0b2d263a2e2fb6894d67c8666549f4d053 , I should have added a test case to RelBuilderTest, and the merge error would have been spotted. The changes I was talking about making to field(int, int, String) would go further. We'd never create an identity project, even if the fields were different. We'd get the "virtual" row type from the Frame data structure, not call RelNode.getRowType(), and the virtual row type would have the field names that were just projected by the user.
        Hide
        julianhyde Julian Hyde added a comment -

        Jark Wu, Are you interested in developing a patch for this?

        Show
        julianhyde Julian Hyde added a comment - Jark Wu , Are you interested in developing a patch for this?
        Hide
        jark Jark Wu added a comment -

        Julian Hyde, I'm glad to work on it. And I have updated the PR [1] according to my understanding.

        [1]: https://github.com/apache/calcite/pull/251

        Show
        jark Jark Wu added a comment - Julian Hyde , I'm glad to work on it. And I have updated the PR [1] according to my understanding. [1] : https://github.com/apache/calcite/pull/251
        Hide
        julianhyde Julian Hyde added a comment - - edited
        Show
        julianhyde Julian Hyde added a comment - - edited Fixed in http://git-wip-us.apache.org/repos/asf/calcite/commit/2193c6e6 . Jark Wu , thanks for the PR!
        Hide
        jcamachorodriguez Jesus Camacho Rodriguez added a comment -

        Resolved in release 1.9.0 (2016-09-22)

        Show
        jcamachorodriguez Jesus Camacho Rodriguez added a comment - Resolved in release 1.9.0 (2016-09-22)
        Hide
        jbalint@gmail.com Jess Balint added a comment -

        I'm unclear how/if the test added here is correct:

          RelNode root =
              builder.scan("DEPT")
                  .project(builder.alias(builder.field(0), "a"),
                      builder.alias(builder.field(1), "b"),
                      builder.alias(builder.field(2), "c"))
                  .as("t1")
                  .project(builder.field("a"),
                      builder.field("t1", "c"))
                  .build();
          final String expected = "LogicalProject(DEPTNO=[$0], LOC=[$2])\n"
              + "  LogicalTableScan(table=[[scott, DEPT]])\n";
        

        Why is RelBuilder permitted to create a tree with rel type having field names that were explicitly overridden with project()?

        Specifically, this code:

                // create "virtual" row type for project only rename fields
                stack.pop();
                stack.push(new Frame(frame.rel, fields.build()));
        

        If this is done internally, it should be reconciled at RelBuilder.build() time to ensure the resulting rel node's type fields match those requested in the projection.

        Show
        jbalint@gmail.com Jess Balint added a comment - I'm unclear how/if the test added here is correct: RelNode root = builder.scan( "DEPT" ) .project(builder.alias(builder.field(0), "a" ), builder.alias(builder.field(1), "b" ), builder.alias(builder.field(2), "c" )) .as( "t1" ) .project(builder.field( "a" ), builder.field( "t1" , "c" )) .build(); final String expected = "LogicalProject(DEPTNO=[$0], LOC=[$2])\n" + " LogicalTableScan(table=[[scott, DEPT]])\n" ; Why is RelBuilder permitted to create a tree with rel type having field names that were explicitly overridden with project() ? Specifically, this code: // create "virtual" row type for project only rename fields stack.pop(); stack.push( new Frame(frame.rel, fields.build())); If this is done internally, it should be reconciled at RelBuilder.build() time to ensure the resulting rel node's type fields match those requested in the projection.
        Hide
        julianhyde Julian Hyde added a comment -

        project() is permitted to no-op. If, for instance, the expressions are the identity projection.

        RelBuilder makes no guarantees about the field names of the produced RelNode so that it can do these kinds of micro-optimization.

        By the way, as part of CALCITE-1583 I am adding a new RelBuilder.optimize method that optimizes the top of the stack and may even change some of the output fields' types from nullable to not-nullable. Consider SELECT * FROM emp LEFT JOIN dept USING (deptno) WHERE dept.name LIKE 'S%'. The Filter can be merged into the left Join, making it an inner Join and making the fields of dept non-nullable.

        I considered doing this in RelBuilder.filter but I thought it would be too surprising for filter to change the field types. However filter has always, and still does, sometimes change field names.

        Show
        julianhyde Julian Hyde added a comment - project() is permitted to no-op. If, for instance, the expressions are the identity projection. RelBuilder makes no guarantees about the field names of the produced RelNode so that it can do these kinds of micro-optimization. By the way, as part of CALCITE-1583 I am adding a new RelBuilder.optimize method that optimizes the top of the stack and may even change some of the output fields' types from nullable to not-nullable. Consider SELECT * FROM emp LEFT JOIN dept USING (deptno) WHERE dept.name LIKE 'S%' . The Filter can be merged into the left Join , making it an inner Join and making the fields of dept non-nullable. I considered doing this in RelBuilder.filter but I thought it would be too surprising for filter to change the field types. However filter has always, and still does, sometimes change field names.
        Hide
        julianhyde Julian Hyde added a comment -

        Jess Balint, Regarding that test case. It is checking that if you have named a field "a" in a RelBuilder you can later refer to it as field "a" from the same RelBuilder. No guarantees about what that field will be called in the resulting RelNode.

        Show
        julianhyde Julian Hyde added a comment - Jess Balint , Regarding that test case. It is checking that if you have named a field "a" in a RelBuilder you can later refer to it as field "a" from the same RelBuilder . No guarantees about what that field will be called in the resulting RelNode .

          People

          • Assignee:
            julianhyde Julian Hyde
            Reporter:
            jark Jark Wu
          • Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development