Description
I noticed the following log entries where localization was being retried on several MR AM files.
2018-02-02 02:53:02,905 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalResourcesTrackerImpl: Resource /hadoop-yarn/usercache/hadoopuser/appcache/application_1517539453610_0001/filecache/11/job.jar is missing, localizing it again 2018-02-02 02:53:42,908 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalResourcesTrackerImpl: Resource /hadoop-yarn/usercache/hadoopuser/appcache/application_1517539453610_0001/filecache/13/job.xml is missing, localizing it again
The cluster is configured to use LCE and yarn.nodemanager.linux-container-executor.nonsecure-mode.local-user is set to a user (hadoopuser) that is in the hadoop group. The user has a umask of 0002. The cluser is configured with fs.permissions.umask-mode=022, coming from core-default. Setting the local-user to nobody, who is not a login user or in the hadoop group, produces the same results.
[hadoopuser@y7001 ~]$ umask 0002 [hadoopuser@y7001 ~]$ id uid=1003(hadoopuser) gid=1004(hadoopuser) groups=1004(hadoopuser),1001(hadoop)
The cause of the log entry was tracked down a simple !file.exists call in LocalResourcesTrackerImpl#isResourcePresent.
public boolean isResourcePresent(LocalizedResource rsrc) { boolean ret = true; if (rsrc.getState() == ResourceState.LOCALIZED) { File file = new File(rsrc.getLocalPath().toUri().getRawPath(). toString()); if (!file.exists()) { ret = false; } else if (dirsHandler != null) { ret = checkLocalResource(rsrc); } } return ret; }
The Resources Tracker runs as the NM user, in this case yarn. The files being retried are in the filecache. The directories in the filecache are all owned by the local-user's primary group and 700 perms, which makes it unreadable by the yarn user.
[root@y7001 ~]# ls -la /hadoop-yarn/usercache/hadoopuser/appcache/application_1517540536531_0001/filecache total 0 drwx--x---. 6 hadoopuser hadoop 46 Feb 2 03:06 . drwxr-s---. 4 hadoopuser hadoop 73 Feb 2 03:07 .. drwx------. 2 hadoopuser hadoopuser 61 Feb 2 03:05 10 drwx------. 3 hadoopuser hadoopuser 21 Feb 2 03:05 11 drwx------. 2 hadoopuser hadoopuser 45 Feb 2 03:06 12 drwx------. 2 hadoopuser hadoopuser 41 Feb 2 03:06 13
I saw YARN-5287, but that appears to be related to a restrictive umask and the usercache itself. I was unable to locate any other known issues that seemed relevent. Is the above already known? a configuration issue?
Attachments
Attachments
Issue Links
- is broken by
-
YARN-2185 Use pipes when localizing archives
- Resolved