Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-3667 Umbrella jira for Correlation Optimizer
  3. HIVE-3669

Support queries in which input tables of correlated MR jobs involves intermediate tables

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • Query Processor
    • None

    Description

      Correlation optimizer implemented in HIVE-2206 does not optimize correlated MapReduce jobs which have intermediate tables as input.

      Here is an example originally posted in HIVE-3430

      select * from
      (
        select c.value, count(1) as cnt from
        (
          select b.key, b.value from
          (
            select key, length(value) from T1 where ds = '1'
          ) a
          join
          T2 b on b.ds = '1' and a.key = b.key
        ) c
        group by c.value
      ) d
      join
      (
        select value, count(1) as cnt from T2 c where c.ds = '1' group by value
      ) e
      on d.value = e.value;
      

      Since correlated MapReduce jobs (those use "value" as the portioning key) involves an intermediate table "c", implementation of HIVE-2206 do not optimize this query.

      Attachments

        Issue Links

          Activity

            People

              yhuai Yin Huai
              yhuai Yin Huai
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated: