The implementation of
Tells the container-executor to write PIDs to cgroup.procs:
Apparently, this file has not always been writeable:
The RHEL version of the Linux kernel that I'm using has a CGroup module that has a non-writeable cgroup.procs file.
$ uname -a
Linux criccomi-ld 2.6.32-131.4.1.el6.x86_64 #1 SMP Fri Jun 10 10:54:26 EDT 2011 x86_64 x86_64 x86_64 GNU/Linux
As a result, when the container-executor tries to run, it fails with this error message:
fprintf(LOGFILE, "Failed to write pid %s (%d) to file %s - %s\n",
This is because the executor is given a resource by the CgroupsLCEResourcesHandler that includes cgroup.procs, which is non-writeable:
$ ls -l
r r r- 1 criccomi eng 0 Jun 11 14:43 cgroup.procs
rw-r r- 1 criccomi eng 0 Jun 11 14:43 cpu.rt_period_us
rw-r r- 1 criccomi eng 0 Jun 11 14:43 cpu.rt_runtime_us
rw-r r- 1 criccomi eng 0 Jun 11 14:43 cpu.shares
rw-r r- 1 criccomi eng 0 Jun 11 14:43 notify_on_release
rw-r r- 1 criccomi eng 0 Jun 11 14:43 tasks
I patched CgroupsLCEResourcesHandler to use /tasks instead of /cgroup.procs, and this appears to have fixed the problem.
I can think of several potential resolutions to this ticket:
1. Ignore the problem, and make people patch YARN when they hit this issue.
2. Write to /tasks instead of /cgroup.procs for everyone
3. Check permissioning on /cgroup.procs prior to writing to it, and fall back to /tasks.
4. Add a config to yarn-site that lets admins specify which file to write to.