Affects Version/s: None
Fix Version/s: 0.14.0
We noticed an occurrence of orphaned container in LinkedIn production environment(using samza-yarn).
The ContainerHeartbeatMonitor added as part of
SAMZA-871 to solve this problem is alive on the orphaned container java process and didn't shut it down.
ContainerHeartbeatMonitor uses single-threaded ScheduledExecutorService to periodically check if the container is orphaned.
From the following process thread dump, it's apparent that the worker thread in ScheduledExecutorService finds the task queue is empty and goes to waiting state(expecting new tasks to be added to the queue).
If the execution of a Runnable submitted to ScheduledExecutorService.scheduleAtFixedRate throws an exception, subsequent executions are suppressed.
Existing ContainerHeartBeatClient implementation which accesses the ApplicationMaster http-endpoint to get container liveness has IOException handlers alone. Any unchecked exceptions thrown from that code path will shutdown the ContainerHeartbeatMonitor(This is the suspected cause).
This requires further investigation.