When job is cancelled, we expect to see it in flink's history server. But I can not see my job after it is cancelled.
After digging into the problem, I find that the function archiveExecutionGraph is not executed. Below is the brief log:
From the log, we can see that job is not finished when dispatcher closes. The process is as following:
- Receive cancel command and send it to all tasks async.
- In MiniDispatcher, begin to shutting down per-job cluster.
- Stopping dispatcher and remove job.
- Job is cancelled and callback is executed in method startJobManagerRunner.
- Because job is removed before, so currentJobManagerRunner is null which not equals to the original jobManagerRunner. In this case, archivedExecutionGraph will not be uploaded.
In normal cases, I find that job is cancelled first and then dispatcher is stopped so that archivedExecutionGraph will succeed. But I think that the order is not constrained and it is hard to know which comes first.
Above is what I suspected. If so, then we should fix it.