Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-609

optimize multi-group by

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 0.4.0
    • Query Processor
    • None
    • Reviewed
    • HIVE-609. Optimize multi-group by. (Namit Jain via zshao)

    Description

      For query like:

      from src
      insert overwrite table dest1 select col1, count(distinct colx) group by col1
      insert overwrite table dest2 select col2, count(distinct colx) group by col2;

      If map side aggregation is turned off, we currently do 4 map-reduce jobs.
      The plan can be optimized by running it in 3 map-reduce jobs, by spraying over the
      distinct column first and then aggregating individual results.

      This may not be possible if there are multiple distinct columns, but the above query is very common
      in data warehousing environments.

      Attachments

        1. hive.609.1.patch
          90 kB
          Namit Jain
        2. hive.609.2.patch
          92 kB
          Namit Jain
        3. hive.609.3.patch
          82 kB
          Namit Jain
        4. hive.609.4.patch
          41 kB
          Namit Jain
        5. hive.609.5.patch
          2 kB
          Namit Jain
        6. hive.609.6.patch
          85 kB
          Namit Jain
        7. hive.609.7.patch
          95 kB
          Namit Jain
        8. hive.609.10.patch
          2 kB
          Namit Jain
        9. hive.609.11.patch
          115 kB
          Namit Jain

        Issue Links

          Activity

            People

              namit Namit Jain
              namit Namit Jain
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: