Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-20203

Arrow SerDe leaks a DirectByteBuffer

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Blocker
    • Resolution: Fixed
    • None
    • 4.0.0-alpha-1
    • None
    • None

    Description

      ArrowColumnarBatchSerDe allocates an arrow NullableMapVector for each task that uses the serde.

      The vector is a DirectByteBuffer allocated from Arrow's off-heap buffer pool.

      This buffer is never closed and leaks about 1K of physical memory for each task.

      This patch does three things:

      1. Ensure the buffer is closed when the RecordWriter for the task is closed. 
      2. Adds per-task memory accounting by assigning a ChildAllocator to each task from the RootAllocator.
      3. Enforces that the ChildAllocator for a task has released all memory assigned to it, when the task is completed. 

      The patch assumes that close() is always called on the RecordWriter when a task is finished (even if there is a failure during task execution). 

      Attachments

        1. HIVE-20203.4.patch
          11 kB
          Eric Wohlstadter
        2. HIVE-20203.3.patch
          8 kB
          Eric Wohlstadter
        3. HIVE-20203.2.patch
          7 kB
          Eric Wohlstadter
        4. HIVE-20203.1.patch
          7 kB
          Eric Wohlstadter

        Activity

          People

            ewohlstadter Eric Wohlstadter
            ewohlstadter Eric Wohlstadter
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: