Thanks Bill Liu for triaging the issue.
When taskmanager.memory.off-heap is disabled, we observed that the total memory that Flink allocates exceed the total memory of the container:
For a 8G container the JobManager starts the container with the following parameter:
The total amount of heap memory plus the off-heap memory exceeds the total amount of memory of the container. As a result YARN occasionally kills the container.