Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Duplicate
-
None
-
None
-
None
-
None
Description
Currently if we enable "hive.tez.dynamic.semijoin.reduction" (the default value is true) in hive on spark, following script fail
set hive.optimize.ppd=true; set hive.ppd.remove.duplicatefilters=true; set hive.spark.dynamic.partition.pruning=true; set hive.optimize.metadataonly=false; set hive.optimize.index.filter=true; set hive.strict.checks.cartesian.product=false; set hive.spark.dynamic.partition.pruning=true; -- multiple sources, single key select count(*) from srcpart join srcpart_date on (srcpart.ds = srcpart_date.ds) join srcpart_hour on (srcpart.hr = srcpart_hour.hr)
the reason why this fail see HIVE-16780, currently we only disable "hive.tez.dynamic.semijoin.reduction" when running hive on spark to pass the test. Later we can implement a similar feature like what hive on tez does.
Attachments
Issue Links
- relates to
-
HIVE-16780 Case "multiple sources, single key" in spark_dynamic_pruning.q fails
- Closed
- requires
-
HIVE-15269 Dynamic Min-Max/BloomFilter runtime-filtering for Tez
- Closed