Details
-
Improvement
-
Status: Closed
-
Major
-
Resolution: Fixed
-
0.12.0
-
None
Description
When hive.optimize.mapjoin.mapreduce is on, CommonJoinResolver can attach a Map-only job (MapJoin) to its following MapReduce job. But this merge only happens when the MapReduce job has a single input. With Correlation Optimizer (HIVE-2206), it is possible that the MapReduce job can have multiple inputs (for multiple operation paths). It is desired to improve CommonJoinResolver to merge a Map-only job to the corresponding Map task of the MapReduce job.
Example:
set hive.optimize.correlation=true; set hive.auto.convert.join=true; set hive.optimize.mapjoin.mapreduce=true; SELECT tmp1.key, count(*) FROM (SELECT x1.key1 AS key FROM bigTable1 x1 JOIN smallTable1 y1 ON (x1.key1 = y1.key1) GROUP BY x1.key1) tmp1 JOIN (SELECT x2.key2 AS key FROM bigTable2 x2 JOIN smallTable2 y2 ON (x2.key2 = y2.key2) GROUP BY x2.key2) tmp2 ON (tmp1.key = tmp2.key) GROUP BY tmp1.key;
In this query, join operations inside tmp1 and tmp2 will be converted to two MapJoins. With Correlation Optimizer, aggregations in tmp1, tmp2, and join of tmp1 and tmp2, and the last aggregation will be executed in the same MapReduce job (Reduce side). Since this MapReduce job has two inputs, right now, CommonJoinResolver cannot attach two MapJoins to the Map side of a MapReduce job.
Another example:
SELECT tmp1.key FROM (SELECT x1.key2 AS key FROM bigTable1 x1 JOIN smallTable1 y1 ON (x1.key1 = y1.key1) UNION ALL SELECT x2.key2 AS key FROM bigTable2 x2 JOIN smallTable2 y2 ON (x2.key1 = y2.key1)) tmp1
For this case, we will have three Map-only jobs (two for MapJoins and one for Union). It will be good to use a single Map-only job to execute this query.
Attachments
Attachments
Issue Links
- is related to
-
HIVE-5891 Alias conflict when merging multiple mapjoin tasks into their common child mapred task
- Resolved
-
HIVE-4927 When we merge two MapJoin MapRedTasks, the TableScanOperator of the second one should be removed
- Closed
- relates to
-
HIVE-2206 add a new optimizer for query correlation discovery and optimization
- Closed
-
HIVE-3952 merge map-job followed by map-reduce job
- Closed
- links to