Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
Description
Prior HIVE-23100 Not null constraints affected newly created RexNode type nullability.
Nullability enables the subquery rewrite algorithm to generate more optimal plan.
https://github.com/apache/hive/blob/1213ad3f0ae0e21e7519dc28b8b6d1401cdd1441/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveSubQueryRemoveRule.java#L324
Example:
explain cbo select ws_sales_price from web_sales, customer, item where ws_bill_customer_sk = c_customer_sk and ws_item_sk = i_item_sk and ( c_customer_sk = 1 or i_item_id in (select i_item_id from item where i_item_sk in (2, 3) ) );
Without not null constraints
HiveProject(ws_sales_price=[$2]) HiveFilter(condition=[OR(AND(<>($6, 0), IS NOT NULL($8)), =($3, 1))]) HiveProject(ws_item_sk=[$0], ws_bill_customer_sk=[$1], ws_sales_price=[$2], c_customer_sk=[$8], i_item_sk=[$3], i_item_id=[$4], c=[$5], i_item_id0=[$6], literalTrue=[$7]) HiveJoin(condition=[=($1, $8)], joinType=[inner], algorithm=[none], cost=[not available]) HiveJoin(condition=[=($0, $3)], joinType=[inner], algorithm=[none], cost=[not available]) HiveProject(ws_item_sk=[$2], ws_bill_customer_sk=[$3], ws_sales_price=[$20]) HiveFilter(condition=[IS NOT NULL($3)]) HiveTableScan(table=[[default, web_sales]], table:alias=[web_sales]) HiveJoin(condition=[=($1, $3)], joinType=[left], algorithm=[none], cost=[not available]) HiveJoin(condition=[true], joinType=[inner], algorithm=[none], cost=[not available]) HiveProject(i_item_sk=[$0], i_item_id=[$1]) HiveTableScan(table=[[default, item]], table:alias=[item]) HiveProject(c=[$0]) HiveAggregate(group=[{}], c=[COUNT()]) HiveFilter(condition=[IN($0, 2:BIGINT, 3:BIGINT)]) HiveTableScan(table=[[default, item]], table:alias=[item]) HiveProject(i_item_id=[$0], literalTrue=[true]) HiveAggregate(group=[{1}]) HiveFilter(condition=[IN($0, 2:BIGINT, 3:BIGINT)]) HiveTableScan(table=[[default, item]], table:alias=[item]) HiveProject(c_customer_sk=[$0]) HiveTableScan(table=[[default, customer]], table:alias=[customer])
With not null constraints
HiveProject(ws_sales_price=[$2]) HiveFilter(condition=[OR(IS NOT NULL($7), =($3, 1))]) HiveProject(ws_item_sk=[$0], ws_bill_customer_sk=[$1], ws_sales_price=[$2], c_customer_sk=[$7], i_item_sk=[$3], i_item_id=[$4], i_item_id0=[$5], literalTrue=[$6]) HiveJoin(condition=[=($1, $7)], joinType=[inner], algorithm=[none], cost=[not available]) HiveJoin(condition=[=($0, $3)], joinType=[inner], algorithm=[none], cost=[not available]) HiveProject(ws_item_sk=[$2], ws_bill_customer_sk=[$3], ws_sales_price=[$20]) HiveFilter(condition=[IS NOT NULL($3)]) HiveTableScan(table=[[default, web_sales]], table:alias=[web_sales]) HiveJoin(condition=[=($1, $2)], joinType=[left], algorithm=[none], cost=[not available]) HiveProject(i_item_sk=[$0], i_item_id=[$1]) HiveTableScan(table=[[default, item]], table:alias=[item]) HiveProject(i_item_id=[$0], literalTrue=[true]) HiveAggregate(group=[{1}]) HiveFilter(condition=[IN($0, 2:BIGINT, 3:BIGINT)]) HiveTableScan(table=[[default, item]], table:alias=[item]) HiveProject(c_customer_sk=[$0]) HiveTableScan(table=[[default, customer]], table:alias=[customer])
In the first plan when not null constraints was ignored there is an extra item table join without join condition:
HiveJoin(condition=[true], joinType=[inner], algorithm=[none], cost=[not available]) HiveProject(i_item_sk=[$0], i_item_id=[$1]) HiveTableScan(table=[[default, item]], table:alias=[item]) HiveProject(c=[$0]) HiveAggregate(group=[{}], c=[COUNT()]) HiveFilter(condition=[IN($0, 2:BIGINT, 3:BIGINT)]) HiveTableScan(table=[[default, item]], table:alias=[item])
The planner is not aware that the i_item_id column has not null defined and it expects null values which needs the extra join.
Attachments
Issue Links
- relates to
-
HIVE-23100 Create RexNode factory and use it in CalcitePlanner
- Closed
- links to