Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-931

Optimize GROUP BY aggregations where key is a sorted/bucketed column

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 0.5.0
    • Query Processor
    • None
    • Reviewed

    Description

      If the table is sorted by a given key, we don't use that for group by. That can be very useful.

      For eg: if T is sorted by column c1,

      For select c1, aggr() from T group by c1
      we always use a single map-reduce job. No hash table is needed on the mapper, since the data is sorted by c1 anyway.

      This will reduce the memory pressure on the mapper and also remove overhead of maintaining the hash table.

      Attachments

        1. hive-931-2009-11-18.patch
          310 kB
          He Yongqiang
        2. hive-931-2009-11-19.patch
          326 kB
          He Yongqiang
        3. hive-931-2009-11-20.3.patch
          416 kB
          He Yongqiang
        4. hive-931-2009-11-21.patch
          416 kB
          He Yongqiang
        5. hive-931-2009-12-01.patch
          438 kB
          He Yongqiang
        6. hive-931-2009-12-03.patch
          449 kB
          He Yongqiang

        Activity

          People

            he yongqiang He Yongqiang
            namit Namit Jain
            Votes:
            1 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: