Uploaded image for project: 'Hadoop Map/Reduce'
  1. Hadoop Map/Reduce
  2. MAPREDUCE-4557

With default settings, log aggregation service creates aggregated log dirs with ownership not matching JH server run-as user and group

Add voteVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments


    • Bug
    • Status: Open
    • Minor
    • Resolution: Unresolved
    • 2.0.0-alpha
    • None
    • nodemanager
    • None
    • CDH 4.0.1 (ZK, HDFS, YARN, HBase) managed by Cloudera Manager 4.0.3


      In order to read aggregated logs, JH server, running as mapred:hadoop by default, tries to access hdfs://<host:port>/tmp/logs/<user>/logs/<appId>/...

      NodeManager runs as yarn:hadoop by default, but creates /tmp/logs initially as user yarn, and group unchanged. E.g., if /tmp as ownership hdfs:supergroup, /tmp/logs will have ownership yarn:supergroup.

      Upon running a job, /tmp/logs/<username> is created by LogAggregationService as the user who submitted the job and leaves the group unchanged, e.g., /tmp/logs/<user> will have ownership <user>:supergroup, and permissions 750.

      Like this, JH server, which runs as user and group mapred:hadoop by default, cannot access the aggregated logs.

      I'm not sure what is a good way of fixing this.

      There does not seem to be a way to fix this behavior through the configuration. While run-as groups can be specified, they do not seem to affect the created directories.

      LogAggregationService should probably use the Nodemanager's run-as user AND group (which default to yarn:hadoop) to create /tmp/logs rather than leave the group unchanged.

      On the other hand, the user and app dirs should better be created with the group unchanged (i.e., hadoop).



          This comment will be Viewable by All Users Viewable by All Users


            Unassigned Unassigned
            martin.gerlach Martin Gerlach




                Issue deployment