Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-799

CgroupsLCEResourcesHandler tries to write to cgroup.procs

Log workAgile BoardRank to TopRank to BottomAttach filesAttach ScreenshotBulk Copy AttachmentsBulk Move AttachmentsVotersWatch issueWatchersCreate sub-taskConvert to sub-taskMoveLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 2.0.4-alpha, 2.0.5-alpha
    • 2.1.0-beta
    • nodemanager
    • None

    Description

      The implementation of

      ./hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/util/CgroupsLCEResourcesHandler.java

      Tells the container-executor to write PIDs to cgroup.procs:

        public String getResourcesOption(ContainerId containerId) {
          String containerName = containerId.toString();
          StringBuilder sb = new StringBuilder("cgroups=");
      
          if (isCpuWeightEnabled()) {
            sb.append(pathForCgroup(CONTROLLER_CPU, containerName) + "/cgroup.procs");
            sb.append(",");
          }
      
          if (sb.charAt(sb.length() - 1) == ',') {
            sb.deleteCharAt(sb.length() - 1);
          } 
          return sb.toString();
        }
      

      Apparently, this file has not always been writeable:

      https://patchwork.kernel.org/patch/116146/
      http://lkml.indiana.edu/hypermail/linux/kernel/1004.1/00536.html
      https://lists.linux-foundation.org/pipermail/containers/2009-July/019679.html

      The RHEL version of the Linux kernel that I'm using has a CGroup module that has a non-writeable cgroup.procs file.

      $ uname -a
      Linux criccomi-ld 2.6.32-131.4.1.el6.x86_64 #1 SMP Fri Jun 10 10:54:26 EDT 2011 x86_64 x86_64 x86_64 GNU/Linux

      As a result, when the container-executor tries to run, it fails with this error message:

      fprintf(LOGFILE, "Failed to write pid %s (%d) to file %s - %s\n",

      This is because the executor is given a resource by the CgroupsLCEResourcesHandler that includes cgroup.procs, which is non-writeable:

      $ pwd
      /cgroup/cpu/hadoop-yarn/container_1370986842149_0001_01_000001
      $ ls -l
      total 0
      rrr- 1 criccomi eng 0 Jun 11 14:43 cgroup.procs
      rw-rr- 1 criccomi eng 0 Jun 11 14:43 cpu.rt_period_us
      rw-rr- 1 criccomi eng 0 Jun 11 14:43 cpu.rt_runtime_us
      rw-rr- 1 criccomi eng 0 Jun 11 14:43 cpu.shares
      rw-rr- 1 criccomi eng 0 Jun 11 14:43 notify_on_release
      rw-rr- 1 criccomi eng 0 Jun 11 14:43 tasks

      I patched CgroupsLCEResourcesHandler to use /tasks instead of /cgroup.procs, and this appears to have fixed the problem.

      I can think of several potential resolutions to this ticket:

      1. Ignore the problem, and make people patch YARN when they hit this issue.
      2. Write to /tasks instead of /cgroup.procs for everyone
      3. Check permissioning on /cgroup.procs prior to writing to it, and fall back to /tasks.
      4. Add a config to yarn-site that lets admins specify which file to write to.

      Thoughts?

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            criccomini Chris Riccomini Assign to me
            criccomini Chris Riccomini
            Votes:
            0 Vote for this issue
            Watchers:
            12 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment