Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
4.0.0-alpha-2
-
None
Description
Repro:
select count(*) from (SELECT T0.plant_no, T0.part_chain, T0.part_new, T0.part_no FROM dm_ads_dims_prod.cloudera_test3 T0 LEFT JOIN (SELECT T0.plant_no, T0.part_chain FROM (SELECT T0.plant_no, T0.part_chain, count( *) AS ct FROM dm_ads_dims_prod.cloudera_test3 T0 WHERE purchase_pos = pos GROUP BY T0.plant_no, T0.part_chain) T0 WHERE ct = 2 ) T1 ON T0.plant_no = T1.plant_no AND T0.part_chain = T1.part_chain WHERE T0.purchase_pos = T0.pos AND (T1.part_chain IS NULL OR (T1.part_chain IS NOT NULL AND T0.fd = 1)) ) s;
Run the query with the following settings on the repro cluster a few times
set hive.query.results.cache.enabled=false; set hive.compute.query.using.stats=false; set hive.auto.convert.join=true;
and the results was
2682424 2682426 2682425
Then turn off hive.auto.convert.join
set hive.query.results.cache.enabled=false; set hive.compute.query.using.stats=false; set hive.auto.convert.join=false;
and the result was always 2682420
Analyzing the plans with hive.auto.convert.join enabled vs disabled, the difference is the type of join Map vs Merge.
Additionally, vectorization also plays a role when turned off the result became good:
SET hive.vectorized.execution.enabled=false;
It is also just a workaround and has negative impact on performance this should help us narrow down where to find the cause of the issue.
Attachments
Issue Links
- is fixed by
-
HIVE-25142 Rehashing in map join fast hash table causing corruption for large keys
- Closed