Details
-
Sub-task
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
Description
Correlation optimizer implemented in HIVE-2206 does not optimize correlated MapReduce jobs which have intermediate tables as input.
Here is an example originally posted in HIVE-3430
select * from ( select c.value, count(1) as cnt from ( select b.key, b.value from ( select key, length(value) from T1 where ds = '1' ) a join T2 b on b.ds = '1' and a.key = b.key ) c group by c.value ) d join ( select value, count(1) as cnt from T2 c where c.ds = '1' group by value ) e on d.value = e.value;
Since correlated MapReduce jobs (those use "value" as the portioning key) involves an intermediate table "c", implementation of HIVE-2206 do not optimize this query.
Attachments
Issue Links
- is blocked by
-
HIVE-2206 add a new optimizer for query correlation discovery and optimization
- Closed