Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
4.0.0-alpha-1
-
None
Description
Reproducer
Consider the following query:
set hive.cbo.rule.exclusion.regex=ReduceExpressionsRule\(Project\); CREATE EXTERNAL TABLE t (a string, b string); INSERT INTO t VALUES ('1000', 'b1'); INSERT INTO t VALUES ('2000', 'b2'); SELECT * FROM ( SELECT a, b FROM t UNION ALL SELECT a, CAST(NULL AS string) FROM t) AS t2 WHERE a = 1000;EXPLAIN CBO SELECT * FROM ( SELECT a, b FROM t UNION ALL SELECT a, CAST(NULL AS string) FROM t) AS t2 WHERE a = 1000;
The expected result is:
1000 b1 1000 NULL
An example of correct plan is as follows:
CBO PLAN: HiveUnion(all=[true]) HiveProject(a=[$0], b=[$1]) HiveFilter(condition=[=(CAST($0):DOUBLE, 1000)]) HiveTableScan(table=[[default, t]], table:alias=[t]) HiveProject(a=[$0], _o__c1=[null:VARCHAR(2147483647) CHARACTER SET "UTF-16LE"]) HiveFilter(condition=[=(CAST($0):DOUBLE, 1000)]) HiveTableScan(table=[[default, t]], table:alias=[t])
Consider now a scenario where expression reduction in projections is disabled by setting the following property:
set hive.cbo.rule.exclusion.regex=ReduceExpressionsRule\(Project\);
In this case, the simplification of CAST(NULL) into NULL does not happen, and we get the following (invalid) result:
1000 b1
produced by the following invalid plan:
CBO PLAN:
HiveProject(a=[$0], b=[$1])
HiveFilter(condition=[=(CAST($0):DOUBLE, 1000)])
HiveTableScan(table=[[default, t]], table:alias=[t])
Problem Analysis
At HiveFilterSetOpTransposeRule.java#L112 the RelMetadataQuery::getPulledUpPredicates method infers the following predicate due to the CAST(NULL) in the projection:
(=($1, CAST(null:NULL):VARCHAR(2147483647) CHARACTER SET "UTF-16LE"))
When the CAST is simplified to the NULL literal, the IS_NULL($1) predicate is inferred.
In HiveFilterSetOpTransposeRule.java#L114-L122, the rule checks if the conjunction of the predicate coming from the filter (here =(CAST($0):DOUBLE, 1000)) and the inferred predicates is satisfiable or not, under the UnknownAsFalse semantics.
To summarize, the following expression is simplified under the UnknownAsFalse semantics:
AND((=($1, CAST(null:NULL):VARCHAR(2147483647) CHARACTER SET "UTF-16LE")), =(CAST($0):DOUBLE, 1000))
Under In such semantics, (=($1, CAST(null:NULL):...) evaluates to FALSE, because no value is equal to NULL (even NULL itself), AND(FALSE, =(CAST($0):DOUBLE, 1000)) necessarily evaluates to FALSE altogether, and the UNION ALL operand is pruned.
Only by chance, when CAST(NULL) is simplified to NULL, we avoid the issue, due to the IS_NULL($1) inferred predicate, see HiveRelMdPredicates.java#L153-L156 for understanding how the NULL literal is treated differently during predicate inference.
HiveRelMdPredicates should not use equality ('=') for nullable constant expressions, but rather IS NOT DISTINCT FROM, as detailed in HIVE-26733, but nonetheless the way simplification is done is not correct here, inferred predicates should be used as "context", rather than been used in a conjunctive expression, this usage does not conform with any of the similar uses of simplification with inferred predicates (see the bottom of the "Solution" section for examples and details).
Solution
In order to correctly simplify a predicate and test if it's always false or not, we should build RexSimplify with predicates as the list of predicates known to hold in the context. In this way, the different semantics are correctly taken into account.
The code at HiveFilterSetOpTransposeRule.java#L114-L121 should be replaced by the following:
final RexExecutor executor = Util.first(filterRel.getCluster().getPlanner().getExecutor(), RexUtil.EXECUTOR); final RexSimplify simplify = new RexSimplify(rexBuilder, predicates, executor); final RexNode x = simplify.simplifyUnknownAs(newCondition, RexUnknownAs.FALSE);
This is in line with other uses of simplification, like in Calcite:
You can see the above method used for different kind of rels, always in the way identical to what this ticket advocates:filters:
https://github.com/apache/calcite/blob/a0ce3275119f804959cda54d6e7a016ab893c359/core/src/main/java/org/apache/calcite/rel/rules/ReduceExpressionsRule.java#L152-L155
Attachments
Issue Links
- relates to
-
HIVE-26733 Not safe to use '=' for predicates on constant expressions that might be NULL
- Open
- links to