Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
4.0.0
Description
Reproducing query is as follows:
create table test1 (act_nbr string); create table test2 (month int); create table test3 (mth int, con_usd double); EXPLAIN SELECT c.month, d.con_usd FROM (SELECT cast(regexp_replace(substr(add_months(from_unixtime(unix_timestamp(), 'yyyy-MM-dd'), -1), 1, 7), '-', '') AS int) AS month FROM test1 UNION ALL SELECT month FROM test2 WHERE month = 202110) c JOIN test3 d ON c.month = d.mth;
Different plans are generated during the first CBO steps, last being:
2021-12-01T08:28:08,598 DEBUG [a18191bb-3a2b-4193-9abf-4e37dd1996bb main] parse.CalcitePlanner: Plan after decorre lation: HiveProject(month=[$0], con_usd=[$2]) HiveJoin(condition=[=($0, $1)], joinType=[inner], algorithm=[none], cost=[not available]) HiveProject(month=[$0]) HiveUnion(all=[true]) HiveProject(month=[CAST(regexp_replace(substr(add_months(FROM_UNIXTIME(UNIX_TIMESTAMP, _UTF-16LE'yyyy-MM-d d':VARCHAR(2147483647) CHARACTER SET "UTF-16LE"), -1), 1, 7), _UTF-16LE'-':VARCHAR(2147483647) CHARACTER SET "UTF- 16LE", _UTF-16LE'':VARCHAR(2147483647) CHARACTER SET "UTF-16LE")):INTEGER]) HiveTableScan(table=[[default, test1]], table:alias=[test1]) HiveProject(month=[$0]) HiveFilter(condition=[=($0, CAST(202110):INTEGER)]) HiveTableScan(table=[[default, test2]], table:alias=[test2]) HiveTableScan(table=[[default, test3]], table:alias=[d])
Then, the HEP planner will keep expanding the filter expression with redundant expressions, such as the following, where the identical CAST expression is present multiple times:
rel#118:HiveFilter.HIVE.[].any(input=HepRelVertex#39,condition=IN(CAST(regexp_replace(substr(add_months(FROM_UNIXTIME(UNIX_TIMESTAMP, _UTF-16LE'yyyy-MM-dd':VARCHAR(2147483647) CHARACTER SET "UTF-16LE"), -1), 1, 7), _UTF-16LE'-':VARCHAR(2147483647) CHARACTER SET "UTF-16LE", _UTF-16LE'':VARCHAR(2147483647) CHARACTER SET "UTF-16LE")):INTEGER, CAST(regexp_replace(substr(add_months(FROM_UNIXTIME(UNIX_TIMESTAMP, _UTF-16LE'yyyy-MM-dd':VARCHAR(2147483647) CHARACTER SET "UTF-16LE"), -1), 1, 7), _UTF-16LE'-':VARCHAR(2147483647) CHARACTER SET "UTF-16LE", _UTF-16LE'':VARCHAR(2147483647) CHARACTER SET "UTF-16LE")):INTEGER, 202110))
The problem seems to come from a bad interaction of at least HiveFilterProjectTransposeRule and HiveJoinPushTransitivePredicatesRule, possibly more.
Most probably then UNION part can be removed and the reproducer be simplified even further.
Attachments
Issue Links
- duplicates
-
HIVE-22710 OOM Exception when enable CBO
- Resolved
- is related to
-
HIVE-25851 Replace HiveRelMdPredicate with RelMdPredicate from Calcite
- Open
- relates to
-
HIVE-25275 OOM during query planning due to HiveJoinPushTransitivePredicatesRule matching infinitely
- Resolved
- links to