Hive
  1. Hive
  2. HIVE-931

Optimize GROUP BY aggregations where key is a sorted/bucketed column

    Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.5.0
    • Component/s: Query Processor
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      If the table is sorted by a given key, we don't use that for group by. That can be very useful.

      For eg: if T is sorted by column c1,

      For select c1, aggr() from T group by c1
      we always use a single map-reduce job. No hash table is needed on the mapper, since the data is sorted by c1 anyway.

      This will reduce the memory pressure on the mapper and also remove overhead of maintaining the hash table.

      1. hive-931-2009-12-03.patch
        449 kB
        He Yongqiang
      2. hive-931-2009-12-01.patch
        438 kB
        He Yongqiang
      3. hive-931-2009-11-21.patch
        416 kB
        He Yongqiang
      4. hive-931-2009-11-20.3.patch
        416 kB
        He Yongqiang
      5. hive-931-2009-11-19.patch
        326 kB
        He Yongqiang
      6. hive-931-2009-11-18.patch
        310 kB
        He Yongqiang

        Activity

        Namit Jain created issue -
        He Yongqiang made changes -
        Field Original Value New Value
        Attachment hive-931-2009-11-18.patch [ 12425440 ]
        He Yongqiang made changes -
        Attachment hive-931-2009-11-19.patch [ 12425568 ]
        He Yongqiang made changes -
        Attachment hive-931-2009-11-20.3.patch [ 12425656 ]
        He Yongqiang made changes -
        Attachment hive-931-2009-11-21.patch [ 12425751 ]
        He Yongqiang made changes -
        Attachment hive-931-2009-12-01.patch [ 12426635 ]
        He Yongqiang made changes -
        Attachment hive-931-2009-12-03.patch [ 12426835 ]
        Namit Jain made changes -
        Status Open [ 1 ] Resolved [ 5 ]
        Hadoop Flags [Reviewed]
        Resolution Fixed [ 1 ]
        Carl Steinbach made changes -
        Summary Sorted Group By Optimize GROUP BY aggregations where key is a sorted/bucketed column
        Carl Steinbach made changes -
        Status Resolved [ 5 ] Closed [ 6 ]

          People

          • Assignee:
            He Yongqiang
            Reporter:
            Namit Jain
          • Votes:
            1 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development