Hadoop Map/Reduce
  1. Hadoop Map/Reduce
  2. MAPREDUCE-563 Security features for Map/Reduce
  3. MAPREDUCE-744

Support in DistributedCache to share cache files with other users after HADOOP-4493

    Details

    • Hadoop Flags:
      Reviewed
    • Release Note:
      Hide
      Fixed DistributedCache to support sharing of the local cache files with other users on the same TaskTracker. The cache files are checked at the client side for public/private access on the file system, and that information is passed in the configuration. The TaskTrackers look at the configuration for each file during task localization, and, if the file was public on the filesystem, they are localized to a common space for sharing by all users' tasks on the TaskTracker. Else the file is localized to the user's private directory on the local filesystem.
      Show
      Fixed DistributedCache to support sharing of the local cache files with other users on the same TaskTracker. The cache files are checked at the client side for public/private access on the file system, and that information is passed in the configuration. The TaskTrackers look at the configuration for each file during task localization, and, if the file was public on the filesystem, they are localized to a common space for sharing by all users' tasks on the TaskTracker. Else the file is localized to the user's private directory on the local filesystem.

      Description

      HADOOP-4493 aims to completely privatize the files distributed to TT via DistributedCache. This jira issues focuses on sharing some/all of these files with all other users.

      1. 744-early.patch
        22 kB
        Devaraj Das
      2. 744-6-y20.patch
        27 kB
        Hemanth Yamijala
      3. 744-6.patch
        26 kB
        Devaraj Das
      4. 744-5.patch
        26 kB
        Devaraj Das
      5. 744-4.patch
        30 kB
        Devaraj Das
      6. 744-3.patch
        28 kB
        Devaraj Das
      7. 744-2.patch
        28 kB
        Devaraj Das
      8. 744-1.patch
        28 kB
        Devaraj Das

        Issue Links

          Activity

          Hide
          Devaraj Das added a comment -

          Attaching a preliminary patch for review. What's done there is that the cache files are checked at the client side for public/private access, and that information (booleans - true for public, false for private) is passed in the configuration. The TaskTrackers look at the configuration for each file during localization, and, if the file was public they localize it to a common space. If not, then the file is localized to the user's private directory.
          Testcase is not there yet.

          Show
          Devaraj Das added a comment - Attaching a preliminary patch for review. What's done there is that the cache files are checked at the client side for public/private access, and that information (booleans - true for public, false for private) is passed in the configuration. The TaskTrackers look at the configuration for each file during localization, and, if the file was public they localize it to a common space. If not, then the file is localized to the user's private directory. Testcase is not there yet.
          Hide
          Devaraj Das added a comment -

          I should add that "public" means world-readable. The entire hierarchy of the cache file path is checked for that (starting from the leaf filename to "/").

          Show
          Devaraj Das added a comment - I should add that "public" means world-readable. The entire hierarchy of the cache file path is checked for that (starting from the leaf filename to "/").
          Hide
          Devaraj Das added a comment -

          Attaching a more complete patch.

          Show
          Devaraj Das added a comment - Attaching a more complete patch.
          Hide
          Devaraj Das added a comment -

          Passing through hudson

          Show
          Devaraj Das added a comment - Passing through hudson
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12427588/744-1.patch
          against trunk revision 889085.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 12 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed core unit tests.

          -1 contrib tests. The patch failed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/184/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/184/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/184/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/184/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12427588/744-1.patch against trunk revision 889085. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 12 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/184/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/184/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/184/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/184/console This message is automatically generated.
          Hide
          Devaraj Das added a comment -

          Fixed the testcase issue to do with TestTrackerDistributedCacheManagerWithLinuxTaskController. The other failing tests are not related to this patch.

          Show
          Devaraj Das added a comment - Fixed the testcase issue to do with TestTrackerDistributedCacheManagerWithLinuxTaskController. The other failing tests are not related to this patch.
          Hide
          Devaraj Das added a comment -

          Retrying hudson

          Show
          Devaraj Das added a comment - Retrying hudson
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12427656/744-2.patch
          against trunk revision 889085.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 12 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/317/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/317/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/317/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/317/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12427656/744-2.patch against trunk revision 889085. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 12 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/317/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/317/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/317/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/317/console This message is automatically generated.
          Hide
          Devaraj Das added a comment -

          I realized that the check for whether a path on the hdfs is public or not should check for EXECUTE permission for Other user for the subdirs in the path. Attached patch fixes that.

          Show
          Devaraj Das added a comment - I realized that the check for whether a path on the hdfs is public or not should check for EXECUTE permission for Other user for the subdirs in the path. Attached patch fixes that.
          Hide
          Boris Shkolnik added a comment -

          Some comments from review:

          1. DistributedCache.get*Visibiliteies - are deprecated. Why do we add deprecated methods?
          2. Do we need separate call for determineFile/ArchiveVisability. It can be determined while calling DistributedCache.addCacheArchive/File()
          3. private static isPublic() should be called differently - something like haveOtherPermissions(Perm), to match its functionality. Some comments would help too.
          4. public methods should have javadoc comments (determineCacheVisibilities() in TrackerDistributedCacheManager)
          5. this line: File workDir = new File(new Path(TEST_ROOT_DIR, "workdir").toString());
          can be changed to File workDir = new File(TEST_ROOT_DIR, "workdir") in Tasktrackerdistributedcachemanager

          Show
          Boris Shkolnik added a comment - Some comments from review: 1. DistributedCache.get*Visibiliteies - are deprecated. Why do we add deprecated methods? 2. Do we need separate call for determineFile/ArchiveVisability. It can be determined while calling DistributedCache.addCacheArchive/File() 3. private static isPublic() should be called differently - something like haveOtherPermissions(Perm), to match its functionality. Some comments would help too. 4. public methods should have javadoc comments (determineCacheVisibilities() in TrackerDistributedCacheManager) 5. this line: File workDir = new File(new Path(TEST_ROOT_DIR, "workdir").toString()); can be changed to File workDir = new File(TEST_ROOT_DIR, "workdir") in Tasktrackerdistributedcachemanager
          Hide
          Devaraj Das added a comment -

          1. DistributedCache.get*Visibiliteies - are deprecated. Why do we add deprecated methods?

          MAPREDUCE-982 needs to be fixed for me to not have the deprecated methods. But i will think about if we can avoid it in any way.

          2. Do we need separate call for determineFile/ArchiveVisability. It can be determined while calling DistributedCache.addCacheArchive/File()

          I don't have a strong preference for this either way but it seems clearer to me to keep it the way i currently have..

          3. private static isPublic() should be called differently - something like haveOtherPermissions(Perm), to match its functionality. Some comments would help too.

          Done

          4. public methods should have javadoc comments (determineCacheVisibilities() in TrackerDistributedCacheManager)

          Done

          5. this line: File workDir = new File(new Path(TEST_ROOT_DIR, "workdir").toString()); can be changed

          Done

          Show
          Devaraj Das added a comment - 1. DistributedCache.get*Visibiliteies - are deprecated. Why do we add deprecated methods? MAPREDUCE-982 needs to be fixed for me to not have the deprecated methods. But i will think about if we can avoid it in any way. 2. Do we need separate call for determineFile/ArchiveVisability. It can be determined while calling DistributedCache.addCacheArchive/File() I don't have a strong preference for this either way but it seems clearer to me to keep it the way i currently have.. 3. private static isPublic() should be called differently - something like haveOtherPermissions(Perm), to match its functionality. Some comments would help too. Done 4. public methods should have javadoc comments (determineCacheVisibilities() in TrackerDistributedCacheManager) Done 5. this line: File workDir = new File(new Path(TEST_ROOT_DIR, "workdir").toString()); can be changed Done
          Hide
          Devaraj Das added a comment -

          1. DistributedCache.get*Visibiliteies - are deprecated. Why do we add deprecated methods?

          Ok it looks like we don't have to really add the methods in DistributedCache since the methods are used only by the framework. So i moved the accessors (getFileVisibilities/getArchiveVisibilities) to TrackerDistributedCacheManager where the setters are defined.
          This seems like the right thing to do ..

          Show
          Devaraj Das added a comment - 1. DistributedCache.get*Visibiliteies - are deprecated. Why do we add deprecated methods? Ok it looks like we don't have to really add the methods in DistributedCache since the methods are used only by the framework. So i moved the accessors (getFileVisibilities/getArchiveVisibilities) to TrackerDistributedCacheManager where the setters are defined. This seems like the right thing to do ..
          Hide
          Boris Shkolnik added a comment -

          Looks good.
          +1

          One small nit:
          TrackerDistributedCacheManager.isPublic and checkPermissionsofOther have empty @return javadoc. This may cause Javadoc warnings.

          Show
          Boris Shkolnik added a comment - Looks good. +1 One small nit: TrackerDistributedCacheManager.isPublic and checkPermissionsofOther have empty @return javadoc. This may cause Javadoc warnings.
          Hide
          Devaraj Das added a comment -

          Thanks Boris for the review. The attached patch has the javadoc fix. All tests, except the ones known to fail (TestStreamingExitStatus, TestStreamingKeyValue), and test-patch passed. I will commit this.

          Show
          Devaraj Das added a comment - Thanks Boris for the review. The attached patch has the javadoc fix. All tests, except the ones known to fail (TestStreamingExitStatus, TestStreamingKeyValue), and test-patch passed. I will commit this.
          Hide
          Devaraj Das added a comment -

          I just committed this.

          Show
          Devaraj Das added a comment - I just committed this.
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Mapreduce-trunk #196 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk/196/)

          Show
          Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk #196 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk/196/ )
          Hide
          Hemanth Yamijala added a comment -

          Patch for older version of hadoop. Not for commit here.

          Show
          Hemanth Yamijala added a comment - Patch for older version of hadoop. Not for commit here.

            People

            • Assignee:
              Devaraj Das
              Reporter:
              Vinod Kumar Vavilapalli
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development