Uploaded image for project: 'Calcite'
  1. Calcite
  2. CALCITE-3138

RelStructuredTypeFlattener doesn't restructure ROW type fields

    XMLWordPrintableJSON

    Details

      Description

      1) RelStructuredTypeFlattener.restructureFields(structType) doesn't support ROW type. However, ROW type is flattened by
      RelStructuredTypeFlattener just like struct. So when user queries one column with complex type ROW, after flattening and restructuring
      top level project returns reference to first inner primitive field of the requested column.

      2) Another finding is related to ITEM expression applied to array of structs column. For example, let's imagine a table with column of
      type ARRAY<STRUCT<a:INT, b:STRUCT<x:INT,y:INT>>>. When user requests is SQL array_column[1], Calcite generates ITEM($0, 1) , where $0 is ref to array column
      from Scan and 1 is index literal. Current flattener generates two field acess expressions ITEM($0, 1).a, ITEM($0, 1).b but dont take into account
      that ITEM($0, 1).b returns struct which also should be flattened.

      3) In some cases applying of ITEM after flattenning is no longer possible. For example, consider column with type
      STRUCT<a:INT, b:STRUCT<x:INT,y:INT>>. User requests column['b'] in query and Calcite creates ITEM($0,'b').
      After flattenning Scan is covered by LogicalProject($0.a, $0.b.x, $0.b.y) and the old projection ITEM($0,'b') can't
      be applied anymore. So now it should be converted to refer only subset of fields ($1,$2) from flattening project.

      UPDATES IN EXPECTED TEST RESULTS:

      ----------------------------------------------------------------------------------
      TEST CASE: SqlToRelConverterTest.testNestedColumnType()

      select empa.home_address.zip from sales.emp_address empa where empa.home_address.city = 'abc'
      

      OLD RESULT:

      LogicalProject(ZIP=[$4])
        LogicalFilter(condition=[=($3, 'abc':VARCHAR(20))])
          LogicalProject(EMPNO=[$0], STREET=[$1.STREET], CITY=[$1.CITY], ZIP=[$1.ZIP], STATE=[$1.STATE], STREET5=[$2.STREET], CITY6=[$2.CITY], ZIP7=[$2.ZIP], STATE8=[$2.STATE])
            LogicalTableScan(table=[[CATALOG, SALES, EMP_ADDRESS]])
      

      1. Above in logical filter, condition references to field $3 which is ZIP=[$1.ZIP] field from previous project,
      however in original query filtering should be done by CITY field.
      2. Also the top level project references to $4 field, which is STATE=[$1.STATE] field from project, but original
      query requested ZIP field.

      UPDATED RESULT:

      LogicalProject(ZIP=[$3])
        LogicalFilter(condition=[=($2, 'abc')])
          LogicalProject(EMPNO=[$0], STREET=[$1.STREET], CITY=[$1.CITY], ZIP=[$1.ZIP], STATE=[$1.STATE], STREET5=[$2.STREET], CITY6=[$2.CITY], ZIP7=[$2.ZIP], STATE8=[$2.STATE])
            LogicalTableScan(table=[[CATALOG, SALES, EMP_ADDRESS]])
      

      ----------------------------------------------------------------------------------
      TEST CASE: SqlToRelConverterTest.testStructTypeAlias()

      select t.r AS myRow from (select row(row(1)) r from dept) t 
      

      OLD RESULT:

      LogicalProject(MYROW$$0$$0=[1])
        LogicalTableScan(table=[[CATALOG, SALES, DEPT]])
      

      1. Inside the subselect of row(row(1)) type of returned column is RecordType(RecordType(INTEGER EXPR$0) EXPR$0),
      but the top level project uses flattened expression and returns to user literal 1 with type RecordType(INTEGER MYROW$$0$$0),
      although the type doesn't match type returned by row(row(1)) expression.
      2. Also it's suspicious that caller expects returned column to have name 'myRow' but gets 'MYROW$$0$$0'.

      UPDATED RESULT:

      LogicalProject(MYROW=[ROW(ROW($0))])
        LogicalProject(MYROW$$0$$0=[1])
          LogicalTableScan(table=[[CATALOG, SALES, DEPT]])
      

      ----------------------------------------------------------------------------------

      TEST CASE: SqlToRelConverterTest.testFlattenRecords()

      select employees[1] from dept_nested 
      

      OLD RESULT:

      LogicalProject(EXPR$0=[$0])
        LogicalProject(EXPR$0$0=[ITEM($3, 1).EMPNO], EXPR$0$1=[ITEM($3, 1).ENAME], EXPR$0$2=[ITEM($3, 1).DETAIL])
          LogicalTableScan(table=[[CATALOG, SALES, DEPT_NESTED]])
      

      1. Given selection of element by index from array, which by definition contain elements of type:

      STRUCT<EMPNO:INT,
             ENAME:VARCHAR(10),
             DETAIL:STRUCT<SKILLS:ARRAY<STRUCT<TYPE:VARCHAR(10),
                                               DESC:VARCHAR(20),
      				         OTHERS:STRUCT<A:VARCHAR(10),B:VARCHAR(10)>>>>>
      

      First problem is that the intermediate project on top of the scan expected to return only flattened types.
      But last projection expression returns `DETAIL` field of array element which has type STRUCT and contain array
      field named 'SKILLS'.

      2. The top level project returns incorrect flattened EMPNO:INT field from retrieved array element instead
      of whole array element of type struct.

      UPDATED RESULT:

      LogicalProject(EXPR$0=[ROW($0, $1, ROW($2))])
        LogicalProject(EXPR$0$0=[ITEM($3, 1).EMPNO], EXPR$0$1=[ITEM($3, 1).ENAME], EXPR$0$2$0=[ITEM($3, 1).DETAIL.SKILLS])
          LogicalTableScan(table=[[CATALOG, SALES, DEPT_NESTED]])
      

      Other few updates in RelToSqlConverterStructsTest.java and PlannerTest.java are more explicit and visible in PR.

        Attachments

        1. ROW_repro.patch
          3 kB
          Haisheng Yuan

          Issue Links

            Activity

              People

              • Assignee:
                ihuzenko Igor Guzenko
                Reporter:
                hyuan Haisheng Yuan
              • Votes:
                0 Vote for this issue
                Watchers:
                6 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 7h 20m
                  7h 20m