Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
None
Description
1) RelStructuredTypeFlattener.restructureFields(structType) doesn't support ROW type. However, ROW type is flattened by
RelStructuredTypeFlattener just like struct. So when user queries one column with complex type ROW, after flattening and restructuring
top level project returns reference to first inner primitive field of the requested column.
2) Another finding is related to ITEM expression applied to array of structs column. For example, let's imagine a table with column of
type ARRAY<STRUCT<a:INT, b:STRUCT<x:INT,y:INT>>>. When user requests is SQL array_column[1], Calcite generates ITEM($0, 1) , where $0 is ref to array column
from Scan and 1 is index literal. Current flattener generates two field acess expressions ITEM($0, 1).a, ITEM($0, 1).b but dont take into account
that ITEM($0, 1).b returns struct which also should be flattened.
3) In some cases applying of ITEM after flattenning is no longer possible. For example, consider column with type
STRUCT<a:INT, b:STRUCT<x:INT,y:INT>>. User requests column['b'] in query and Calcite creates ITEM($0,'b').
After flattenning Scan is covered by LogicalProject($0.a, $0.b.x, $0.b.y) and the old projection ITEM($0,'b') can't
be applied anymore. So now it should be converted to refer only subset of fields ($1,$2) from flattening project.
UPDATES IN EXPECTED TEST RESULTS:
----------------------------------------------------------------------------------
TEST CASE: SqlToRelConverterTest.testNestedColumnType()
select empa.home_address.zip from sales.emp_address empa where empa.home_address.city = 'abc'
OLD RESULT:
LogicalProject(ZIP=[$4])
LogicalFilter(condition=[=($3, 'abc':VARCHAR(20))])
LogicalProject(EMPNO=[$0], STREET=[$1.STREET], CITY=[$1.CITY], ZIP=[$1.ZIP], STATE=[$1.STATE], STREET5=[$2.STREET], CITY6=[$2.CITY], ZIP7=[$2.ZIP], STATE8=[$2.STATE])
LogicalTableScan(table=[[CATALOG, SALES, EMP_ADDRESS]])
1. Above in logical filter, condition references to field $3 which is ZIP=[$1.ZIP] field from previous project,
however in original query filtering should be done by CITY field.
2. Also the top level project references to $4 field, which is STATE=[$1.STATE] field from project, but original
query requested ZIP field.
UPDATED RESULT:
LogicalProject(ZIP=[$3])
LogicalFilter(condition=[=($2, 'abc')])
LogicalProject(EMPNO=[$0], STREET=[$1.STREET], CITY=[$1.CITY], ZIP=[$1.ZIP], STATE=[$1.STATE], STREET5=[$2.STREET], CITY6=[$2.CITY], ZIP7=[$2.ZIP], STATE8=[$2.STATE])
LogicalTableScan(table=[[CATALOG, SALES, EMP_ADDRESS]])
----------------------------------------------------------------------------------
TEST CASE: SqlToRelConverterTest.testStructTypeAlias()
select t.r AS myRow from (select row(row(1)) r from dept) t
OLD RESULT:
LogicalProject(MYROW$$0$$0=[1]) LogicalTableScan(table=[[CATALOG, SALES, DEPT]])
1. Inside the subselect of row(row(1)) type of returned column is RecordType(RecordType(INTEGER EXPR$0) EXPR$0),
but the top level project uses flattened expression and returns to user literal 1 with type RecordType(INTEGER MYROW$$0$$0),
although the type doesn't match type returned by row(row(1)) expression.
2. Also it's suspicious that caller expects returned column to have name 'myRow' but gets 'MYROW$$0$$0'.
UPDATED RESULT:
LogicalProject(MYROW=[ROW(ROW($0))]) LogicalProject(MYROW$$0$$0=[1]) LogicalTableScan(table=[[CATALOG, SALES, DEPT]])
----------------------------------------------------------------------------------
TEST CASE: SqlToRelConverterTest.testFlattenRecords()
select employees[1] from dept_nested
OLD RESULT:
LogicalProject(EXPR$0=[$0]) LogicalProject(EXPR$0$0=[ITEM($3, 1).EMPNO], EXPR$0$1=[ITEM($3, 1).ENAME], EXPR$0$2=[ITEM($3, 1).DETAIL]) LogicalTableScan(table=[[CATALOG, SALES, DEPT_NESTED]])
1. Given selection of element by index from array, which by definition contain elements of type:
STRUCT<EMPNO:INT, ENAME:VARCHAR(10), DETAIL:STRUCT<SKILLS:ARRAY<STRUCT<TYPE:VARCHAR(10), DESC:VARCHAR(20), OTHERS:STRUCT<A:VARCHAR(10),B:VARCHAR(10)>>>>>
First problem is that the intermediate project on top of the scan expected to return only flattened types.
But last projection expression returns `DETAIL` field of array element which has type STRUCT and contain array
field named 'SKILLS'.
2. The top level project returns incorrect flattened EMPNO:INT field from retrieved array element instead
of whole array element of type struct.
UPDATED RESULT:
LogicalProject(EXPR$0=[ROW($0, $1, ROW($2))]) LogicalProject(EXPR$0$0=[ITEM($3, 1).EMPNO], EXPR$0$1=[ITEM($3, 1).ENAME], EXPR$0$2$0=[ITEM($3, 1).DETAIL.SKILLS]) LogicalTableScan(table=[[CATALOG, SALES, DEPT_NESTED]])
Other few updates in RelToSqlConverterStructsTest.java and PlannerTest.java are more explicit and visible in PR.
Attachments
Attachments
Issue Links
- blocks
-
DRILL-7253 Read Hive struct w/o nulls
- Resolved
- contains
-
CALCITE-2963 IndexOutOfBoundsException in RelStructuredTypeFlattener when accessing nested fields
- Resolved
-
CALCITE-2221 RelRoot.project does not take into account flattened record type
- Resolved
- duplicates
-
CALCITE-3582 Make RelStructuredTypeFlattener configurable in PlannerImpl
- Closed
- links to