Hadoop Map/Reduce
  1. Hadoop Map/Reduce
  2. MAPREDUCE-1435

symlinks in cwd of the task are not handled properly after MAPREDUCE-896

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.22.0
    • Fix Version/s: 0.21.0
    • Component/s: tasktracker
    • Labels:
      None
    • Hadoop Flags:
      Reviewed
    • Release Note:
      Fixes bugs in linux task controller and TaskRunner.setupWorkDir() related to handling of symlinks.

      Description

      With JVM reuse, TaskRunner.setupWorkDir() lists the contents of workDir and does a fs.delete on each path listed. If the listed file is a symlink to directory, it will delete the contents of those linked directories. This would delete files from distributed cache and jars directory,if mapred.create.symlink is true.
      Changing ownership/permissions of symlinks through ENABLE_TASK_FOR_CLEANUP would change ownership/permissions of underlying files.

      This is observed by Karam while running streaming jobs with DistributedCache and jvm reuse.

      1. MR-1435-y20s-1.txt
        21 kB
        Hemanth Yamijala
      2. MR-1435-y20s.patch
        13 kB
        Hemanth Yamijala
      3. 1435.v4.patch
        27 kB
        Hemanth Yamijala
      4. 1435.v3.patch
        25 kB
        Hemanth Yamijala
      5. 1435.v2.patch
        19 kB
        Ravi Gummadi
      6. 1435.v1.patch
        12 kB
        Ravi Gummadi
      7. 1435.patch
        7 kB
        Ravi Gummadi

        Issue Links

          Activity

          Hide
          Arun C Murthy added a comment -

          We suspect this was caused by MAPREDUCE-896.

          Show
          Arun C Murthy added a comment - We suspect this was caused by MAPREDUCE-896 .
          Hide
          Ravi Gummadi added a comment -

          Attaching patch for trunk fixing the issues in TaskRunner.setupWorkDir() and in task-controller.c

          This patch assumes that the api FileUtil.fullyDeleteContents() is available(that is part of HADOOP-6531).

          Please review and provide your comments.

          Show
          Ravi Gummadi added a comment - Attaching patch for trunk fixing the issues in TaskRunner.setupWorkDir() and in task-controller.c This patch assumes that the api FileUtil.fullyDeleteContents() is available(that is part of HADOOP-6531 ). Please review and provide your comments.
          Hide
          Ravi Gummadi added a comment -

          Missed removing unused method deleteDirContents() from TaskRunner.java in earlier patch. Now removed that method and the unnecessary test TestSetupWorkDir.java

          Show
          Ravi Gummadi added a comment - Missed removing unused method deleteDirContents() from TaskRunner.java in earlier patch. Now removed that method and the unnecessary test TestSetupWorkDir.java
          Hide
          Hemanth Yamijala added a comment -

          As discussed offline, we probably need a unit test coverage for the changes in TaskRunner.setupWorkDir as well. Can you please include that ? Also, we may want to add a public distributed cache file in the streaming test case and ensure its permissions do not change too.

          Show
          Hemanth Yamijala added a comment - As discussed offline, we probably need a unit test coverage for the changes in TaskRunner.setupWorkDir as well. Can you please include that ? Also, we may want to add a public distributed cache file in the streaming test case and ensure its permissions do not change too.
          Hide
          Ravi Gummadi added a comment -

          Attaching patch with review comments incorporated.

          Show
          Ravi Gummadi added a comment - Attaching patch with review comments incorporated.
          Hide
          Ravi Gummadi added a comment -

          TestFileArgs failed(known issue MAPREDUCE-1375) and all other unit tests passed on my local machine.

          Show
          Ravi Gummadi added a comment - TestFileArgs failed(known issue MAPREDUCE-1375 ) and all other unit tests passed on my local machine.
          Hide
          Hemanth Yamijala added a comment -

          A few minor nits:

          • Typo: contets in comments of ClusterWithLinuxTaskController.
          • checkPermissionsOnPrivateDistCache: Do we expect at least one of the directories to have the file. If yes, we should check for that; otherwise we will mask errors. Similarly for checkPermissionsOnPublicDistCache
          • checkPublicFilePermissions seems to be verifying permissions only for others. Should we also check permissions for owner and group ?
          • Comments in createSubDirsAndSymLinks should be dir1/subDir, dir1/file
          • Do we need DefaultTaskController.enableTaskForCleanup to give rwx access to group ? Setting the permissions for user should be sufficient, I think.
          Show
          Hemanth Yamijala added a comment - A few minor nits: Typo: contets in comments of ClusterWithLinuxTaskController. checkPermissionsOnPrivateDistCache: Do we expect at least one of the directories to have the file. If yes, we should check for that; otherwise we will mask errors. Similarly for checkPermissionsOnPublicDistCache checkPublicFilePermissions seems to be verifying permissions only for others. Should we also check permissions for owner and group ? Comments in createSubDirsAndSymLinks should be dir1/subDir, dir1/file Do we need DefaultTaskController.enableTaskForCleanup to give rwx access to group ? Setting the permissions for user should be sufficient, I think.
          Hide
          Hemanth Yamijala added a comment -

          Preliminary patch for earlier version of Hadoop. Not for commit.

          Show
          Hemanth Yamijala added a comment - Preliminary patch for earlier version of Hadoop. Not for commit.
          Hide
          Hemanth Yamijala added a comment -

          Attached patch addresses my review comments.

          checkPermissionsOnPrivateDistCache: Do we expect at least one of the directories to have the file. If yes, we should check for that; otherwise we will mask errors. Similarly for checkPermissionsOnPublicDistCache

          I have introduced new APIs in ClusterWithLinuxTaskController to check that a list of file names are present among the localized files. Essentially, I separated the permissions checking with the presence checks.

          checkPublicFilePermissions seems to be verifying permissions only for others. Should we also check permissions for owner and group ?

          Similar code existed in TestTrackerDistributedCacheManager. I refactored this code to a new API in the same class so that it can be called from outside and I am using it in checkPublicFilePermissions. I added checks for owner and group in TestTrackerDistributedCacheManager.

          Show
          Hemanth Yamijala added a comment - Attached patch addresses my review comments. checkPermissionsOnPrivateDistCache: Do we expect at least one of the directories to have the file. If yes, we should check for that; otherwise we will mask errors. Similarly for checkPermissionsOnPublicDistCache I have introduced new APIs in ClusterWithLinuxTaskController to check that a list of file names are present among the localized files. Essentially, I separated the permissions checking with the presence checks. checkPublicFilePermissions seems to be verifying permissions only for others. Should we also check permissions for owner and group ? Similar code existed in TestTrackerDistributedCacheManager. I refactored this code to a new API in the same class so that it can be called from outside and I am using it in checkPublicFilePermissions. I added checks for owner and group in TestTrackerDistributedCacheManager.
          Hide
          Hemanth Yamijala added a comment -

          Running through Hudson.

          Show
          Hemanth Yamijala added a comment - Running through Hudson.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12437630/1435.v3.patch
          against trunk revision 918206.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 18 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/15/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/15/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/15/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/15/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12437630/1435.v3.patch against trunk revision 918206. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 18 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/15/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/15/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/15/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/15/console This message is automatically generated.
          Hide
          Hemanth Yamijala added a comment -

          -1 core tests. The patch failed core unit tests.

          The core test failure is being tracked in MAPREDUCE-1520.

          Show
          Hemanth Yamijala added a comment - -1 core tests. The patch failed core unit tests. The core test failure is being tracked in MAPREDUCE-1520 .
          Hide
          Amareshwari Sriramadasu added a comment -

          In TestStreamingAsDifferentUser, group ownership is compared with primary group of the launcher. It would fail if task-controller binary is owned by secondary group of the launcher. It can be compared with ClusterWithLinuxTaskController.taskTrackerSpecialGroup directly, which introduced in MAPREDUCE-1421.

          Show
          Amareshwari Sriramadasu added a comment - In TestStreamingAsDifferentUser, group ownership is compared with primary group of the launcher. It would fail if task-controller binary is owned by secondary group of the launcher. It can be compared with ClusterWithLinuxTaskController.taskTrackerSpecialGroup directly, which introduced in MAPREDUCE-1421 .
          Hide
          Ravi Gummadi added a comment -

          Would it be better if we check if the owner and group of public distributed cache files are correct also ?

          Once we add that also to the checks of public distributed cache files, then ClusterWithLinuxTaskController.checkPermissionsOnDir() can be reused for these checks also and can avoid TestTrackerDistributedCacheManager.checkPublicFilePermissions() possibly.

          Show
          Ravi Gummadi added a comment - Would it be better if we check if the owner and group of public distributed cache files are correct also ? Once we add that also to the checks of public distributed cache files, then ClusterWithLinuxTaskController.checkPermissionsOnDir() can be reused for these checks also and can avoid TestTrackerDistributedCacheManager.checkPublicFilePermissions() possibly.
          Hide
          Hemanth Yamijala added a comment -

          Canceling patch to incorporate review comments.

          Show
          Hemanth Yamijala added a comment - Canceling patch to incorporate review comments.
          Hide
          Hemanth Yamijala added a comment -

          Patch incorporates review comments from Amarsri and Ravi. Changes are:

          • I am now using ClusterWithLinuxTaskController.taskTrackerSpecialGroup as the expected group for private distributed cache files.
          • Added ownership and group ownership checks for public distributed cache files. Group owner for public distributed cache is the primary owner of the tasktracker. I added a ClusterWithLinuxTaskController.taskTrackerPrimaryGroup on similar lines as ClusterWithLinuxTaskController.taskTrackerSpecialGroup.

          However,

          Once we add that also to the checks of public distributed cache files, then ClusterWithLinuxTaskController.checkPermissionsOnDir() can be reused for these checks also and can avoid TestTrackerDistributedCacheManager.checkPublicFilePermissions() possibly.

          I have not done the above. This is because the checks for permissions of private distributed cache files includes exact match of all the permissions for owner, group and others. For public distributed cache files, the code only adds 'read' and 'execute' bits for all users. Specifically, it does not modify the 'write' bits. This means that the write permissions are indeterminate (for e.g. they could depend on permissions of files in an archive which are unarchived in distributed cache). Hence, instead of reusing the model for checking permissions, I have retained the original model for checking permissions of the public cache files.

          I ran all task-controller tests on this, and they passed.

          Show
          Hemanth Yamijala added a comment - Patch incorporates review comments from Amarsri and Ravi. Changes are: I am now using ClusterWithLinuxTaskController.taskTrackerSpecialGroup as the expected group for private distributed cache files. Added ownership and group ownership checks for public distributed cache files. Group owner for public distributed cache is the primary owner of the tasktracker. I added a ClusterWithLinuxTaskController.taskTrackerPrimaryGroup on similar lines as ClusterWithLinuxTaskController.taskTrackerSpecialGroup. However, Once we add that also to the checks of public distributed cache files, then ClusterWithLinuxTaskController.checkPermissionsOnDir() can be reused for these checks also and can avoid TestTrackerDistributedCacheManager.checkPublicFilePermissions() possibly. I have not done the above. This is because the checks for permissions of private distributed cache files includes exact match of all the permissions for owner, group and others. For public distributed cache files, the code only adds 'read' and 'execute' bits for all users. Specifically, it does not modify the 'write' bits. This means that the write permissions are indeterminate (for e.g. they could depend on permissions of files in an archive which are unarchived in distributed cache). Hence, instead of reusing the model for checking permissions, I have retained the original model for checking permissions of the public cache files. I ran all task-controller tests on this, and they passed.
          Hide
          Hemanth Yamijala added a comment -

          Running through Hudson.

          Show
          Hemanth Yamijala added a comment - Running through Hudson.
          Hide
          Hemanth Yamijala added a comment -

          Output of test-patch:

               [exec] +1 overall.  
               [exec] 
               [exec]     +1 @author.  The patch does not contain any @author tags.
               [exec] 
               [exec]     +1 tests included.  The patch appears to include 18 new or modified tests.
               [exec] 
               [exec]     +1 javadoc.  The javadoc tool did not generate any warning messages.
               [exec] 
               [exec]     +1 javac.  The applied patch does not increase the total number of javac compiler warnings.
               [exec] 
               [exec]     +1 findbugs.  The patch does not introduce any new Findbugs warnings.
               [exec] 
               [exec]     +1 release audit.  The applied patch does not increase the total number of release audit warnings.
          
          Show
          Hemanth Yamijala added a comment - Output of test-patch: [exec] +1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 18 new or modified tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings.
          Hide
          Ravi Gummadi added a comment -

          Patch looks good.
          +1

          Show
          Ravi Gummadi added a comment - Patch looks good. +1
          Hide
          Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12437873/1435.v4.patch
          against trunk revision 918864.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 18 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/500/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/500/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/500/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/500/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12437873/1435.v4.patch against trunk revision 918864. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 18 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/500/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/500/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/500/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/500/console This message is automatically generated.
          Hide
          Hemanth Yamijala added a comment -

          I committed this patch to trunk. Thanks, Ravi !

          Show
          Hemanth Yamijala added a comment - I committed this patch to trunk. Thanks, Ravi !
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Mapreduce-trunk-Commit #261 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk-Commit/261/)
          . Fix symlink handling in task work directory when cleaning up, essentially to avoid following links. Contributed by Ravi Gummadi.

          Show
          Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk-Commit #261 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk-Commit/261/ ) . Fix symlink handling in task work directory when cleaning up, essentially to avoid following links. Contributed by Ravi Gummadi.
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Mapreduce-trunk #250 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk/250/)

          Show
          Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk #250 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk/250/ )
          Hide
          Hemanth Yamijala added a comment -

          This patch is over MR-1435-y20s.patch for earlier version of Hadoop. It improves test cases by adding a couple of more tests, synchronizing this with trunk's patch. I am running tests. This patch is not for commit here.

          Show
          Hemanth Yamijala added a comment - This patch is over MR-1435-y20s.patch for earlier version of Hadoop. It improves test cases by adding a couple of more tests, synchronizing this with trunk's patch. I am running tests. This patch is not for commit here.

            People

            • Assignee:
              Ravi Gummadi
              Reporter:
              Amareshwari Sriramadasu
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development