Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-3667 Umbrella jira for Correlation Optimizer
  3. HIVE-3669

Support queries in which input tables of correlated MR jobs involves intermediate tables

    XMLWordPrintableJSON

    Details

    • Type: Sub-task
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: Query Processor
    • Labels:
      None

      Description

      Correlation optimizer implemented in HIVE-2206 does not optimize correlated MapReduce jobs which have intermediate tables as input.

      Here is an example originally posted in HIVE-3430

      select * from
      (
        select c.value, count(1) as cnt from
        (
          select b.key, b.value from
          (
            select key, length(value) from T1 where ds = '1'
          ) a
          join
          T2 b on b.ds = '1' and a.key = b.key
        ) c
        group by c.value
      ) d
      join
      (
        select value, count(1) as cnt from T2 c where c.ds = '1' group by value
      ) e
      on d.value = e.value;
      

      Since correlated MapReduce jobs (those use "value" as the portioning key) involves an intermediate table "c", implementation of HIVE-2206 do not optimize this query.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                yhuai Yin Huai
                Reporter:
                yhuai Yin Huai
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated: