Hive
  1. Hive
  2. HIVE-1772

optimize join followed by a groupby

    Details

    • Type: Improvement Improvement
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Not A Problem
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: Query Processor
    • Labels:
      None

      Description

      explain SELECT x.key, count(1) FROM src1 x JOIN src y ON (x.key = y.key) group by x.key;

      STAGE DEPENDENCIES:
      Stage-1 is a root stage
      Stage-2 depends on stages: Stage-1
      Stage-0 is a root stage

      The above query issues 2 map-reduce jobs.
      The first MR job performs the join, whereas the second MR performs the group by.
      Since the data is already sorted, the group by can be performed in the reducer of the join itself.

        Issue Links

        There are no Sub-Tasks for this issue.

          Activity

          Yin Huai made changes -
          Status Open [ 1 ] Resolved [ 5 ]
          Resolution Not A Problem [ 8 ]
          Yin Huai made changes -
          Link This issue relates to HIVE-3667 [ HIVE-3667 ]
          Yin Huai made changes -
          Link This issue relates to HIVE-2206 [ HIVE-2206 ]
          Navis made changes -
          Link This issue relates to HIVE-3430 [ HIVE-3430 ]
          Navis made changes -
          Assignee Navis [ navis ]
          John Sichi made changes -
          Assignee Navis [ navis ]
          Navis made changes -
          Field Original Value New Value
          Attachment HIVE-1772.1.patch [ 12489320 ]
          Namit Jain created issue -

            People

            • Assignee:
              Unassigned
              Reporter:
              Namit Jain
            • Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development