Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-12837

Better memory estimation/allocation for hybrid grace hash join during hash table loading

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 2.1.0
    • 2.1.0
    • Hive
    • None

    Description

      This is to avoid an edge case when the memory available is very little (less than a single write buffer size), and we start loading the hash table. Since the write buffer is lazily allocated, we will easily run out of memory before even checking if we should spill any hash partition.

      e.g.
      Total memory available: 210 MB
      Size of ref array of BytesBytesMultiHashMap for each hash partition: ~16 MB
      Size of write buffer: 8 MB (lazy allocation)
      Number of hash partitions: 16
      Number of hash partitions created in memory: 13
      Number of hash partitions created on disk: 3
      Available memory left after HybridHashTableContainer initialization: 210-16*13=2MB

      Now let's say a row is to be loaded into a hash partition in memory, it will try to allocate an 8MB write buffer for it, but we only have 2MB, thus OOM.

      Solution is to perform the check for possible spilling earlier so we can spill partitions if memory is about to be full, to avoid OOM.

      Attachments

        1. HIVE-12837.1.patch
          6 kB
          Wei Zheng
        2. HIVE-12837.2.patch
          7 kB
          Wei Zheng
        3. HIVE-12837.3.patch
          8 kB
          Wei Zheng
        4. HIVE-12837.4.patch
          7 kB
          Wei Zheng
        5. HIVE-12837.5.patch
          7 kB
          Wei Zheng

        Issue Links

          Activity

            People

              wzheng Wei Zheng
              wzheng Wei Zheng
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: