Details
-
Bug
-
Status: Patch Available
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
-
None
Description
When ContainerRetryPolicy is NEVER_RETRY, container worker dir will not be saved in NM state store.
ContainerLaunch.java
... private void recordContainerWorkDir(ContainerId containerId, String workDir) throws IOException{ container.setWorkDir(workDir); if (container.isRetryContextSet()) { context.getNMStateStore().storeContainerWorkDir(containerId, workDir); } }
Then NM restarts, container.workDir could not recover and is null, and may cause some exceptions.
We already have a problem, after NM restart, we send a resource localization request while container is running(YARN-1503), then NM will fail because of the following exception.
So, container.workdir always need to be saved in NM state store.
ContainerImpl.java
static class ResourceLocalizedWhileRunningTransition extends ContainerTransition { ... String linkFile = new Path(container.workDir, link).toString(); ...
java.lang.IllegalArgumentException: Can not create a Path from a null string
at org.apache.hadoop.fs.Path.checkPathArg(Path.java:159)
at org.apache.hadoop.fs.Path.<init>(Path.java:175)
at org.apache.hadoop.fs.Path.<init>(Path.java:110)
... ...
Attachments
Attachments
Issue Links
- is related to
-
YARN-1503 Support making additional 'LocalResources' available to running containers
- Open