Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-11587 Fix memory estimates for mapjoin hashtable
  3. HIVE-10793

Hybrid Hybrid Grace Hash Join : Don't allocate all hash table memory upfront

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 1.2.0
    • 1.3.0, 2.0.0
    • Hive

    Description

      HybridHashTableContainer will allocate memory based on estimate, which means if the actual is less than the estimate the allocated memory won't be used.

      Number of partitions is calculated based on estimated data size

      numPartitions = calcNumPartitions(memoryThreshold, estimatedTableSize, minNumParts, minWbSize,
                nwayConf);
      

      Then based on number of partitions writeBufferSize is set

      writeBufferSize = (int)(estimatedTableSize / numPartitions);
      

      Each hash partition will allocate 1 WriteBuffer, with no further allocation if the estimate data size is correct.

      Suggested solution is to reduce writeBufferSize by a factor such that only X% of the memory is preallocated.

      Attachments

        1. HIVE-10793.1.patch
          5 kB
          Mostafa Mokhtar
        2. HIVE-10793.2.patch
          5 kB
          Mostafa Mokhtar

        Activity

          People

            mmokhtar Mostafa Mokhtar
            mmokhtar Mostafa Mokhtar
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: