Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-931

Optimize GROUP BY aggregations where key is a sorted/bucketed column

    Details

    • Type: New Feature
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.5.0
    • Component/s: Query Processor
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      If the table is sorted by a given key, we don't use that for group by. That can be very useful.

      For eg: if T is sorted by column c1,

      For select c1, aggr() from T group by c1
      we always use a single map-reduce job. No hash table is needed on the mapper, since the data is sorted by c1 anyway.

      This will reduce the memory pressure on the mapper and also remove overhead of maintaining the hash table.

        Attachments

        1. hive-931-2009-12-03.patch
          449 kB
          He Yongqiang
        2. hive-931-2009-12-01.patch
          438 kB
          He Yongqiang
        3. hive-931-2009-11-21.patch
          416 kB
          He Yongqiang
        4. hive-931-2009-11-20.3.patch
          416 kB
          He Yongqiang
        5. hive-931-2009-11-19.patch
          326 kB
          He Yongqiang
        6. hive-931-2009-11-18.patch
          310 kB
          He Yongqiang

          Activity

            People

            • Assignee:
              he yongqiang He Yongqiang
              Reporter:
              namit Namit Jain
            • Votes:
              1 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: