Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
1.6.3, 1.8.0
Description
Issue detail info
In our flink(1.6.3) product env, I often encounter a scene that yarn application can't stop when flink job failed in per-job yarn cluste mode, so I deeply analyzed the reason why it happened.
When a flink job fail, system will write an archive file to a FileSystem through MiniDispatcher#archiveExecutionGraph method, then notify YarnJobClusterEntrypoint to shutDown. But, if MiniDispatcher#archiveExecutionGraph throw exceptions during execution, it affect the following calls.
So I open FLINK-12247 to solve NEP bug when system write archive to FileSystem. But We still need to consider other exceptions, so we should catch Exception / Throwable not just IOExcetion.
Flink yarn job fail flow
Flink yarn job fail on yarn
Flink yarn application can't stop