Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-1772

optimize join followed by a groupby

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Not A Problem
    • None
    • None
    • Query Processor
    • None

    Description

      explain SELECT x.key, count(1) FROM src1 x JOIN src y ON (x.key = y.key) group by x.key;

      STAGE DEPENDENCIES:
      Stage-1 is a root stage
      Stage-2 depends on stages: Stage-1
      Stage-0 is a root stage

      The above query issues 2 map-reduce jobs.
      The first MR job performs the join, whereas the second MR performs the group by.
      Since the data is already sorted, the group by can be performed in the reducer of the join itself.

      Attachments

        1. HIVE-1772.1.patch
          14 kB
          Navis Ryu

        Issue Links

          Activity

            People

              Unassigned Unassigned
              namit Namit Jain
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: