Hive
  1. Hive
  2. HIVE-5258

Optimize aggregations without Group By followed by a Cross Join

    Details

    • Type: Improvement Improvement
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      For example, we should use a single MR job to execute the following query

      SELECT *
      FROM (SELECT tmp1.cnt1, tmp2.cnt2
            FROM (SELECT count(*) as cnt1
                  FROM src1 x) tmp1
            JOIN (SELECT count(*) as cnt2
                  FROM src1 y) tmp2) tmp3;
      

      The reduce phase should have the reduce side GroupByOperators of tmp1 and tmp2, and the JoinOperator for the cross join.

        Activity

        Hide
        Yin Huai added a comment -

        Handling aggregations with the DISTINCT keyword will be tricky. We may overwhelm the single reducer when those columns with the DISTINCT keyword have a lot of distinct values.

        Show
        Yin Huai added a comment - Handling aggregations with the DISTINCT keyword will be tricky. We may overwhelm the single reducer when those columns with the DISTINCT keyword have a lot of distinct values.

          People

          • Assignee:
            Yin Huai
            Reporter:
            Yin Huai
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:

              Development