Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-1139

GroupByOperator sometimes throws OutOfMemory error when there are too many distinct keys

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 0.5.0
    • None
    • Query Processor
    • None

    Description

      When a partial aggregation performed on a mapper, a HashMap is created to keep all distinct keys in main memory. This could leads to OOM exception when there are too many distinct keys for a particular mapper. A workaround is to set the map split size smaller so that each mapper takes less number of rows. A better solution is to use the persistent HashMapWrapper (currently used in CommonJoinOperator) to spill overflow rows to disk.

      Attachments

        1. PersistentMap.zip
          13 kB
          Soundararajan Velu

        Activity

          People

            aprabhakar Arvind Prabhakar
            nzhang Ning Zhang
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: