Details
-
Improvement
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
Description
At present, when the checkpoint is out of order, only out-of-order logs will be printed on the Task side, while on the JM side, the checkpoint can only fail through timeout, and the real reason cannot be confirmed.
Therefore, I think we should add failure information on the JM side for the out-of-order checkpoint.
if (lastCheckpointId >= metadata.getCheckpointId()) { LOG.info( "Out of order checkpoint barrier (aborted previously?): {} >= {}", lastCheckpointId, metadata.getCheckpointId()); channelStateWriter.abort(metadata.getCheckpointId(), new CancellationException(), true); checkAndClearAbortedStatus(metadata.getCheckpointId()); return; }