Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-7879

NM user is unable to access the application filecache due to permissions

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Fixed
    • 3.1.0
    • 3.1.0
    • None
    • None
    • Reviewed

    Description

      I noticed the following log entries where localization was being retried on several MR AM files.

      2018-02-02 02:53:02,905 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalResourcesTrackerImpl: Resource /hadoop-yarn/usercache/hadoopuser/appcache/application_1517539453610_0001/filecache/11/job.jar is missing, localizing it again
      2018-02-02 02:53:42,908 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalResourcesTrackerImpl: Resource /hadoop-yarn/usercache/hadoopuser/appcache/application_1517539453610_0001/filecache/13/job.xml is missing, localizing it again
      

      The cluster is configured to use LCE and yarn.nodemanager.linux-container-executor.nonsecure-mode.local-user is set to a user (hadoopuser) that is in the hadoop group. The user has a umask of 0002. The cluser is configured with fs.permissions.umask-mode=022, coming from core-default. Setting the local-user to nobody, who is not a login user or in the hadoop group, produces the same results.

      [hadoopuser@y7001 ~]$ umask
      0002
      [hadoopuser@y7001 ~]$ id
      uid=1003(hadoopuser) gid=1004(hadoopuser) groups=1004(hadoopuser),1001(hadoop)
      

      The cause of the log entry was tracked down a simple !file.exists call in LocalResourcesTrackerImpl#isResourcePresent.

        public boolean isResourcePresent(LocalizedResource rsrc) {
          boolean ret = true;
          if (rsrc.getState() == ResourceState.LOCALIZED) {
            File file = new File(rsrc.getLocalPath().toUri().getRawPath().
              toString());
            if (!file.exists()) {
              ret = false;
            } else if (dirsHandler != null) {
              ret = checkLocalResource(rsrc);
            }
          }
          return ret;
        }
      

      The Resources Tracker runs as the NM user, in this case yarn. The files being retried are in the filecache. The directories in the filecache are all owned by the local-user's primary group and 700 perms, which makes it unreadable by the yarn user.

      [root@y7001 ~]# ls -la /hadoop-yarn/usercache/hadoopuser/appcache/application_1517540536531_0001/filecache
      total 0
      drwx--x---. 6 hadoopuser hadoop     46 Feb  2 03:06 .
      drwxr-s---. 4 hadoopuser hadoop     73 Feb  2 03:07 ..
      drwx------. 2 hadoopuser hadoopuser 61 Feb  2 03:05 10
      drwx------. 3 hadoopuser hadoopuser 21 Feb  2 03:05 11
      drwx------. 2 hadoopuser hadoopuser 45 Feb  2 03:06 12
      drwx------. 2 hadoopuser hadoopuser 41 Feb  2 03:06 13
      

      I saw YARN-5287, but that appears to be related to a restrictive umask and the usercache itself. I was unable to locate any other known issues that seemed relevent. Is the above already known? a configuration issue?

      Attachments

        1. YARN-7879.001.patch
          2 kB
          Jason Darrell Lowe

        Issue Links

          Activity

            People

              jlowe Jason Darrell Lowe
              shanekumpf@gmail.com Shane Kumpf
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: