Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
Reviewed
Description
When NodeManager encounters the below "No such file or directory" error reported against the "container-executor", it should give up participating in the cluster as it is not capable to run any container, but just fail the jobs.
2023-01-18 10:08:10,600 WARN org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exit code from container container_e159_1673543180101_9407_02_ 000014 startLocalizer is : -1 org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationException: java.io.IOException: Cannot run program "/var/lib/yarn-ce/bin/container-executor": error=2, No such file or directory at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:183) at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.startLocalizer(LinuxContainerExecutor.java:403) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.j ava:1250) Caused by: java.io.IOException: Cannot run program "/var/lib/yarn-ce/bin/container-executor": error=2, No such file or directory
Attachments
Issue Links
- blocks
-
YARN-11715 NodeManager should recover by itself once the container-executor can run program again
- Open
- causes
-
YARN-11721 Do not mark the NM unhealthy when an app is killed
- Resolved
- links to
(1 links to)