Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-799

CgroupsLCEResourcesHandler tries to write to cgroup.procs

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.0.4-alpha, 2.0.5-alpha
    • Fix Version/s: 2.1.0-beta
    • Component/s: nodemanager
    • Labels:
      None

      Description

      The implementation of

      ./hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/util/CgroupsLCEResourcesHandler.java

      Tells the container-executor to write PIDs to cgroup.procs:

        public String getResourcesOption(ContainerId containerId) {
          String containerName = containerId.toString();
          StringBuilder sb = new StringBuilder("cgroups=");
      
          if (isCpuWeightEnabled()) {
            sb.append(pathForCgroup(CONTROLLER_CPU, containerName) + "/cgroup.procs");
            sb.append(",");
          }
      
          if (sb.charAt(sb.length() - 1) == ',') {
            sb.deleteCharAt(sb.length() - 1);
          } 
          return sb.toString();
        }
      

      Apparently, this file has not always been writeable:

      https://patchwork.kernel.org/patch/116146/
      http://lkml.indiana.edu/hypermail/linux/kernel/1004.1/00536.html
      https://lists.linux-foundation.org/pipermail/containers/2009-July/019679.html

      The RHEL version of the Linux kernel that I'm using has a CGroup module that has a non-writeable cgroup.procs file.

      $ uname -a
      Linux criccomi-ld 2.6.32-131.4.1.el6.x86_64 #1 SMP Fri Jun 10 10:54:26 EDT 2011 x86_64 x86_64 x86_64 GNU/Linux

      As a result, when the container-executor tries to run, it fails with this error message:

      fprintf(LOGFILE, "Failed to write pid %s (%d) to file %s - %s\n",

      This is because the executor is given a resource by the CgroupsLCEResourcesHandler that includes cgroup.procs, which is non-writeable:

      $ pwd
      /cgroup/cpu/hadoop-yarn/container_1370986842149_0001_01_000001
      $ ls -l
      total 0
      rrr- 1 criccomi eng 0 Jun 11 14:43 cgroup.procs
      rw-rr- 1 criccomi eng 0 Jun 11 14:43 cpu.rt_period_us
      rw-rr- 1 criccomi eng 0 Jun 11 14:43 cpu.rt_runtime_us
      rw-rr- 1 criccomi eng 0 Jun 11 14:43 cpu.shares
      rw-rr- 1 criccomi eng 0 Jun 11 14:43 notify_on_release
      rw-rr- 1 criccomi eng 0 Jun 11 14:43 tasks

      I patched CgroupsLCEResourcesHandler to use /tasks instead of /cgroup.procs, and this appears to have fixed the problem.

      I can think of several potential resolutions to this ticket:

      1. Ignore the problem, and make people patch YARN when they hit this issue.
      2. Write to /tasks instead of /cgroup.procs for everyone
      3. Check permissioning on /cgroup.procs prior to writing to it, and fall back to /tasks.
      4. Add a config to yarn-site that lets admins specify which file to write to.

      Thoughts?

        Attachments

        1. YARN-799.0.patch
          0.9 kB
          Chris Riccomini

          Activity

            People

            • Assignee:
              criccomini Chris Riccomini
              Reporter:
              criccomini Chris Riccomini
            • Votes:
              0 Vote for this issue
              Watchers:
              13 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: