Details
-
New Feature
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
Description
When YarnAutoScalingManager detect helix task consistently fail, give an option to send WorkUnitChangeEvent to let GobblinHelixJobLauncher handle the event and split the work unit during runtime. This can help resolving consistent failing containers issue(like OOM) during runtime instead of relying on replaner to restart the whole pipeline