[HIVE-3669] Support queries in which input tables of correlated MR jobs involves intermediate tables - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Sub-task
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: None
Fix Version/s: None
Component/s: Query Processor
Labels:
None

Description

Correlation optimizer implemented in ~~HIVE-2206~~ does not optimize correlated MapReduce jobs which have intermediate tables as input.

Here is an example originally posted in ~~HIVE-3430~~

select * from
(
  select c.value, count(1) as cnt from
  (
    select b.key, b.value from
    (
      select key, length(value) from T1 where ds = '1'
    ) a
    join
    T2 b on b.ds = '1' and a.key = b.key
  ) c
  group by c.value
) d
join
(
  select value, count(1) as cnt from T2 c where c.ds = '1' group by value
) e
on d.value = e.value;

Since correlated MapReduce jobs (those use "value" as the portioning key) involves an intermediate table "c", implementation of ~~HIVE-2206~~ do not optimize this query.

Attachments

Issue Links

is blocked by

HIVE-2206 add a new optimizer for query correlation discovery and optimization

Closed

Activity

People

Assignee:: Yin Huai

Reporter:: Yin Huai

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 05/Nov/12 15:01

Updated:: 05/Nov/12 15:03