Details
-
Sub-task
-
Status: Closed
-
Major
-
Resolution: Fixed
-
1.10.0
Description
Global failure handling(full restarts) is widely used in ExecutionGraph components and even other components to recover the job from an inconsistent state.
We need to support it for DefaultScheduler to not break the safety net. More details see here.
There can be follow ups of this task to replace usages of full restarts with JVM termination, in cases that are considered as bugs/unexpected to happen.
Implementation plan:
1. Add getGlobalFailureHandlingResult(Throwable) in ExecutionFailureHandler
2. Add an interface handleGlobalFailure(Throwable) in SchedulerNG and implement it in DefaultScheduler
3. Add an interface notifyGlobalFailure(Throwable) in InternalTaskFailuresListener and rework the implementations to use SchedulerNG#handleGlobalFailure
4. Rework ExecutionGraph#failGlobal to invoke InternalTaskFailuresListener#notifyGlobalFailure for ng scheduler
Attachments
Issue Links
- blocks
-
FLINK-14373 Enable ZooKeeperHighAvailabilityITCase to pass with scheduler NG
- Closed
-
FLINK-14389 Restore task state in new DefaultScheduler
- Closed
- links to