Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Won't Fix
-
None
-
None
-
None
Description
Now through a Submarine job run a tensorflow algorithm, If the JOB fails, All containers will be destroyed.
So we have no way to locate the problem in the container. So we need YARN to support one parameter, The container is not destroyed after the task fails.
If you add this feature, please let Yarn-service also support this interface. Because YARN's REST interface is relatively lightweight and easy to use.