Details
-
Bug
-
Status: Open
-
Not a Priority
-
Resolution: Unresolved
-
1.11.3, 1.12.1, 1.13.0
-
None
Description
Task.executionState and Task.failureCause are not set atomically. This became an issue when implementing the exception history (FLINK-21187) where we relied on the invariant that a failureCause is present when the Task failed.
Adding this check to Task.notifyFinalStage() will reveal the race condition.
TaskExecutorSlotLifetimeTest becomes unstable when adding this invariant. The reason is that the test starts a task but does not wait for the task to be finished. The task finalization and the cancellation of the task triggered through stopping the TaskManager shutdown compete with each other and could cause the executionState to be set to FAILED while the failureCause still being null. This is then forwarded to Execution through Task.notifyFinalState.
We should set failureCause while setting the executionState to failed to not miss any caught error.
Attachments
Issue Links
- is blocked by
-
FLINK-22060 Move null handling from ErrorInfo into Task.notifyFinalState
- Open
- is related to
-
FLINK-21187 RootException history implementation in DefaultScheduler
- Closed