Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-11587 Fix memory estimates for mapjoin hashtable
  3. HIVE-10793

Hybrid Hybrid Grace Hash Join : Don't allocate all hash table memory upfront

    XMLWordPrintableJSON

    Details

    • Type: Sub-task
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 1.2.0
    • Fix Version/s: 1.3.0, 2.0.0
    • Component/s: Hive
    • Labels:

      Description

      HybridHashTableContainer will allocate memory based on estimate, which means if the actual is less than the estimate the allocated memory won't be used.

      Number of partitions is calculated based on estimated data size

      numPartitions = calcNumPartitions(memoryThreshold, estimatedTableSize, minNumParts, minWbSize,
                nwayConf);
      

      Then based on number of partitions writeBufferSize is set

      writeBufferSize = (int)(estimatedTableSize / numPartitions);
      

      Each hash partition will allocate 1 WriteBuffer, with no further allocation if the estimate data size is correct.

      Suggested solution is to reduce writeBufferSize by a factor such that only X% of the memory is preallocated.

        Attachments

        1. HIVE-10793.2.patch
          5 kB
          Mostafa Mokhtar
        2. HIVE-10793.1.patch
          5 kB
          Mostafa Mokhtar

          Activity

            People

            • Assignee:
              mmokhtar Mostafa Mokhtar
              Reporter:
              mmokhtar Mostafa Mokhtar
            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: