Details
-
Bug
-
Status: Closed
-
Blocker
-
Resolution: Fixed
-
None
-
None
-
None
Description
ArrowColumnarBatchSerDe allocates an arrow NullableMapVector for each task that uses the serde.
The vector is a DirectByteBuffer allocated from Arrow's off-heap buffer pool.
This buffer is never closed and leaks about 1K of physical memory for each task.
This patch does three things:
- Ensure the buffer is closed when the RecordWriter for the task is closed.
- Adds per-task memory accounting by assigning a ChildAllocator to each task from the RootAllocator.
- Enforces that the ChildAllocator for a task has released all memory assigned to it, when the task is completed.
The patch assumes that close() is always called on the RecordWriter when a task is finished (even if there is a failure during task execution).