Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
3.1.1, 4.0.0
-
None
Description
The lack of GC pauses is killing LLAP containers whenever the significant amount of memory is consumed by the off-heap structures which aren't cleaned up automatically until the GC runs.
There's a java.nio.DirectByteBuffer.Deallocator which runs when the Direct buffers are garbage collected, which actually does the cleanup of the underlying off-heap buffers.
The lack of Garbage collection activity for several hours while responding to queries triggers a build-up of these off-heap structures which end up forcing YARN to kill the process instead.
It is better to hit a GC pause occasionally rather than to lose a node every few hours.