Details
-
Sub-task
-
Status: Closed
-
Major
-
Resolution: Fixed
-
None
Description
HoS DPP will split a given operator tree in two under the following conditions: it has detected that the query can benefit from DPP, and the filter is not a map-join (see SplitOpTreeForDPP).
This can hurt performance if the the non-partitioned side of the join involves a complex operator tree - e.g. the query select count from srcpart where srcpart.ds in (select max(srcpart.ds) from srcpart union all select min(srcpart.ds) from srcpart) will require running the subquery twice, once in each Spark job.
Queries with map-joins don't get split into two operator trees and thus don't suffer from this drawback. Thus, it would be nice to have a config key that just enables DPP on HoS for map-joins.