Details

    • Type: Sub-task Sub-task
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.21.0
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      Currently the jobclient accesses the mapred.system.dir to add job details. Hence the mapred.system.dir has the permissions of rwx-wx-wx. This could be a security loophole where the job files might get overwritten/tampered after the job submission.

      1. HADOOP-3578-v2.6.patch
        67 kB
        Amar Kamat
      2. HADOOP-3578-v2.7.patch
        67 kB
        Amar Kamat
      3. hadoop-3578-branch-20-example.patch
        24 kB
        Amar Kamat
      4. hadoop-3578-branch-20-example-2.patch
        10 kB
        Amar Kamat
      5. MAPRED-181-v3.8.patch
        124 kB
        Amar Kamat
      6. MAPRED-181-v3.32.patch
        159 kB
        Amar Kamat
      7. 181-1.patch
        106 kB
        Devaraj Das
      8. 181-2.patch
        161 kB
        Devaraj Das
      9. 181-3.patch
        151 kB
        Devaraj Das
      10. 181-3.patch
        152 kB
        Devaraj Das
      11. 181-4.patch
        154 kB
        Devaraj Das
      12. 181-5.1.patch
        157 kB
        Devaraj Das
      13. 181-5.1.patch
        157 kB
        Devaraj Das
      14. 181-6.patch
        166 kB
        Devaraj Das
      15. 181-8.patch
        169 kB
        Devaraj Das
      16. 181.20.s.3.patch
        157 kB
        Devaraj Das
      17. jobclient.patch
        0.7 kB
        Devaraj Das
      18. 181.20.s.3.fix.patch
        0.4 kB
        Ravi Gummadi

        Issue Links

          Activity

          Hide
          Ravi Gummadi added a comment -

          Fixing an issue of change in config property name for earlier version of hadoop on top of 181.20.s.3.patch. Not for commit here.

          Show
          Ravi Gummadi added a comment - Fixing an issue of change in config property name for earlier version of hadoop on top of 181.20.s.3.patch. Not for commit here.
          Hide
          Devaraj Das added a comment -

          Attaching a bugfix to do with using the right jobconf, in the Y20 distribution. Not for commit here.

          Show
          Devaraj Das added a comment - Attaching a bugfix to do with using the right jobconf, in the Y20 distribution. Not for commit here.
          Hide
          Devaraj Das added a comment -

          The patch for the yahoo 0.20 branch (not to be committed)

          Show
          Devaraj Das added a comment - The patch for the yahoo 0.20 branch (not to be committed)
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Mapreduce-trunk #196 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk/196/)

          Show
          Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk #196 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk/196/ )
          Hide
          Devaraj Das added a comment -

          I just committed this. Thanks Amar for the initial patches on this one.

          Show
          Devaraj Das added a comment - I just committed this. Thanks Amar for the initial patches on this one.
          Hide
          Owen O'Malley added a comment -

          This looks good.

          +1

          Show
          Owen O'Malley added a comment - This looks good. +1
          Hide
          Devaraj Das added a comment -

          All tests passed (tests that failed are known to fail - TestStreamingExitStatus, TestStreamingKeyValue).

          Show
          Devaraj Das added a comment - All tests passed (tests that failed are known to fail - TestStreamingExitStatus, TestStreamingKeyValue).
          Hide
          Devaraj Das added a comment -

          This fixes Owen's offline comments about having a finite limit on the split meta info that the JobTracker reads. The other comment was about a typo in writJobSplitMetaInfo.
          I also fixed the testcases. To be specific, w.r.t the earlier patch, the differences in this w.r.t the testcases are in
          1) TestSubmitJob.java / TestSeveral.java / ClusterWithLinuxTaskController.java where i setup the staging area root directory with proper permissions so that job clients can create the ".staging" directories there.
          Other than that a javadoc warning is fixed.

          I ran "test-patch" locally and it passed. "ant test" is in progress.

          Show
          Devaraj Das added a comment - This fixes Owen's offline comments about having a finite limit on the split meta info that the JobTracker reads. The other comment was about a typo in writJobSplitMetaInfo. I also fixed the testcases. To be specific, w.r.t the earlier patch, the differences in this w.r.t the testcases are in 1) TestSubmitJob.java / TestSeveral.java / ClusterWithLinuxTaskController.java where i setup the staging area root directory with proper permissions so that job clients can create the ".staging" directories there. Other than that a javadoc warning is fixed. I ran "test-patch" locally and it passed. "ant test" is in progress.
          Hide
          Devaraj Das added a comment -

          In my local tests, i discovered that i had to do a bunch of changes to work around the extra checks that i introduced in the last patch. One of them being check for ownership of the staging dir now includes a check for the UGI of the submitting user (otherwise tests that fake UGI were failing during job submission). I also introduced a method for getting the staging area location from the JobTracker (so that the user's home dir doesn't get clobbered with files in .staging dir when tests are run).
          I am still testing this patch. With the server side groups patch in, i might need to do some minor changes in the testcases for them to work in the new model of job submission. But this should mostly be good overall.. Up for review.

          Show
          Devaraj Das added a comment - In my local tests, i discovered that i had to do a bunch of changes to work around the extra checks that i introduced in the last patch. One of them being check for ownership of the staging dir now includes a check for the UGI of the submitting user (otherwise tests that fake UGI were failing during job submission). I also introduced a method for getting the staging area location from the JobTracker (so that the user's home dir doesn't get clobbered with files in .staging dir when tests are run). I am still testing this patch. With the server side groups patch in, i might need to do some minor changes in the testcases for them to work in the new model of job submission. But this should mostly be good overall.. Up for review.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12428376/181-5.1.patch
          against trunk revision 892178.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 78 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed core unit tests.

          -1 contrib tests. The patch failed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/218/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/218/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/218/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/218/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12428376/181-5.1.patch against trunk revision 892178. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 78 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/218/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/218/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/218/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/218/console This message is automatically generated.
          Hide
          Devaraj Das added a comment -

          Sorry, the last patch had a silly bug in the new checks i introduced.

          Show
          Devaraj Das added a comment - Sorry, the last patch had a silly bug in the new checks i introduced.
          Hide
          Devaraj Das added a comment -

          Thanks for the review, Owen. This patch addresses the concerns. I also did one more change - the JobInProgress constructor now checks whether the username in the submitted jobconf is the same as the one obtained from the UGI, and if not, fails the job submission. Ideally, we should not use conf.getUser anywhere but since it is used even in the TaskTracker code, i left it as it is but instead fail the job submission if the user string from the two sources don't match..

          Show
          Devaraj Das added a comment - Thanks for the review, Owen. This patch addresses the concerns. I also did one more change - the JobInProgress constructor now checks whether the username in the submitted jobconf is the same as the one obtained from the UGI, and if not, fails the job submission. Ideally, we should not use conf.getUser anywhere but since it is used even in the TaskTracker code, i left it as it is but instead fail the job submission if the user string from the two sources don't match..
          Hide
          Owen O'Malley added a comment -

          I think that JobInfo should just contain the user as a Text. Otherwise, we'll end up with trouble with the upcoming changes to UGI.

          The job tracker should:

          1. fail to come up if the system directory is owned by the wrong user
          2. chmod it to 700, if it isn't already. (And log a warning about the change).

          In JobTracker.java, you have some spurious spacing changes.

          The job client's job submission should fail unless:

          1. the staging directory doesn't exist (it will be created with 700)
          2. the owner is the current user
          3. the permission isn't 700

          Let's make the client side logging messages about the split generation debugs instead of info.

          This is looking good. I'm really looking forward to have job submission secure. smile

          Show
          Owen O'Malley added a comment - I think that JobInfo should just contain the user as a Text. Otherwise, we'll end up with trouble with the upcoming changes to UGI. The job tracker should: fail to come up if the system directory is owned by the wrong user chmod it to 700, if it isn't already. (And log a warning about the change). In JobTracker.java, you have some spurious spacing changes. The job client's job submission should fail unless: the staging directory doesn't exist (it will be created with 700) the owner is the current user the permission isn't 700 Let's make the client side logging messages about the split generation debugs instead of info. This is looking good. I'm really looking forward to have job submission secure. smile
          Hide
          Devaraj Das added a comment -

          On the failing tests, failure of TestGridmixSubmission is a known issue. The other two tests don't fail on my local machine..

          Show
          Devaraj Das added a comment - On the failing tests, failure of TestGridmixSubmission is a known issue. The other two tests don't fail on my local machine..
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12426876/181-4.patch
          against trunk revision 887096.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 78 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed core unit tests.

          -1 contrib tests. The patch failed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/289/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/289/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/289/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/289/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12426876/181-4.patch against trunk revision 887096. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 78 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/289/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/289/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/289/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/289/console This message is automatically generated.
          Hide
          Devaraj Das added a comment -

          Attaching a patch that has fixes to do with TestGridmixSubmission and TestMultipleInputs. I had forgotten to change those testcases in the earlier patches.

          Show
          Devaraj Das added a comment - Attaching a patch that has fixes to do with TestGridmixSubmission and TestMultipleInputs. I had forgotten to change those testcases in the earlier patches.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12426616/181-3.patch
          against trunk revision 885530.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 75 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed core unit tests.

          -1 contrib tests. The patch failed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/284/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/284/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/284/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/284/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12426616/181-3.patch against trunk revision 885530. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 75 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/284/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/284/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/284/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/284/console This message is automatically generated.
          Hide
          Devaraj Das added a comment -

          Corrected patch.

          Show
          Devaraj Das added a comment - Corrected patch.
          Hide
          Devaraj Das added a comment -

          My bad. My last patch had a silly change that led to the test failures.

          Show
          Devaraj Das added a comment - My bad. My last patch had a silly change that led to the test failures.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12426590/181-3.patch
          against trunk revision 885530.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 75 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed core unit tests.

          -1 contrib tests. The patch failed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/158/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/158/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/158/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/158/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12426590/181-3.patch against trunk revision 885530. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 75 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/158/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/158/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/158/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/158/console This message is automatically generated.
          Hide
          Devaraj Das added a comment -

          This patch fixes the findbugs warning and does some cleanup of the testcases.

          Show
          Devaraj Das added a comment - This patch fixes the findbugs warning and does some cleanup of the testcases.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12426518/181-2.patch
          against trunk revision 885530.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 78 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          -1 findbugs. The patch appears to introduce 1 new Findbugs warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed core unit tests.

          -1 contrib tests. The patch failed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/282/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/282/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/282/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/282/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12426518/181-2.patch against trunk revision 885530. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 78 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 1 new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/282/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/282/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/282/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/282/console This message is automatically generated.
          Hide
          Devaraj Das added a comment -

          Quite close i think.

          Show
          Devaraj Das added a comment - Quite close i think.
          Hide
          Devaraj Das added a comment -

          Forgot to mention - this patch is a modified version of Amar's last patch.

          Show
          Devaraj Das added a comment - Forgot to mention - this patch is a modified version of Amar's last patch.
          Hide
          Devaraj Das added a comment -

          Uploading a patch for review. The patch has most of the core functionality changes (including changes to LocalJob/Isolation runners, and mumak). I am still fixing the testcases.

          Show
          Devaraj Das added a comment - Uploading a patch for review. The patch has most of the core functionality changes (including changes to LocalJob/Isolation runners, and mumak). I am still fixing the testcases.
          Hide
          Amar Kamat added a comment -

          Attaching an incomplete patch for 0.21 branch. SimulatorJobInProgress needs to be ported.

          Show
          Amar Kamat added a comment - Attaching an incomplete patch for 0.21 branch. SimulatorJobInProgress needs to be ported.
          Hide
          Doug Cutting added a comment -

          > Today, we put version per file.

          I was suggesting that we could put the version for all files in a root-file like job.xml, to avoid adding a file just for the version. I personally would prefer that, and, if I were implementing it, would do it that way, but I am not implementing this and would not block this patch over that decision.

          Show
          Doug Cutting added a comment - > Today, we put version per file. I was suggesting that we could put the version for all files in a root-file like job.xml, to avoid adding a file just for the version. I personally would prefer that, and, if I were implementing it, would do it that way, but I am not implementing this and would not block this patch over that decision.
          Hide
          Amar Kamat added a comment -

          Doug, me and Owen had a chat on as how to enforce version control. The reason why Owen is suggesting version file per folder is because any change in any of the job submission files (e.g job.xml -> job.bin or job.split -> job.split+job.metainfo) should reject the whole job. Today, we put version per file. But its redundant to keep same version info in every file.

          Show
          Amar Kamat added a comment - Doug, me and Owen had a chat on as how to enforce version control. The reason why Owen is suggesting version file per folder is because any change in any of the job submission files (e.g job.xml -> job.bin or job.split -> job.split+job.metainfo) should reject the whole job. Today, we put version per file. But its redundant to keep same version info in every file.
          Hide
          Doug Cutting added a comment -

          +1 This sounds good to me. I'd prefer the version not be a separate file, but would not reject this design over that.

          > at some point we will change the job conf from xml to binary. That isn't easy to do without a version on the directory.

          Wouldn't that be clear if it were named job.bin instead of job.xml? If job.bin does not exist then we'd look for job.xml. The version number could then be stored in the configuration. I don't see any disadvantages to this, and it would be nice not to add another file per job. Is there a reason I'm missing?

          Show
          Doug Cutting added a comment - +1 This sounds good to me. I'd prefer the version not be a separate file, but would not reject this design over that. > at some point we will change the job conf from xml to binary. That isn't easy to do without a version on the directory. Wouldn't that be clear if it were named job.bin instead of job.xml? If job.bin does not exist then we'd look for job.xml. The version number could then be stored in the configuration. I don't see any disadvantages to this, and it would be nice not to add another file per job. Is there a reason I'm missing?
          Hide
          Amar Kamat added a comment -

          Had a chat with Owen and here is the job submission process with few extra addons :

          1. jobclient requests the jobtracker for a jobid [say $jobid]
          2. jobclient upload job.xml, job.jar, job.split, job.splitmetainfo, version, libs, archives etc to the staging area i.e ~/.staging/$jobid
          3. jobclient now contructs a job-submission-token which contains
            1. job staging area location (for job start and restart)
            2. job-submission version (for client-master compatibility)
            3. some checksum info (will expand on this later)
            4. user-credentials (for now username)
          4. jobclient passes job-submission-token over the rpc to jobtracker
          5. jobtracker persists this info in mapred.system.dir
          6. jobtracker uses the user-credentials in the job-meta-info to read the job.xml and job.splitmetainfo.
          7. jobtracker checks for job staging checksum
          8. when the tasktracker asks for a task, a Task is passed which contains the location of job.split along with start-offset and length.
          9. upon restart the jobtracker reads the job-meta info and re-submits the job (where the checksum check is done again)
          10. once the job is done, the staging area is deleted

          Checksum:

          1. job.xml md5 : this prevents jobtracker/tasktrackers from using a changed jobconf across job-submission and restarts.
          2. job-staging-area modification time : this prevents jobtracker and tasktracker for running jobs for which the staging area has changed.
          Show
          Amar Kamat added a comment - Had a chat with Owen and here is the job submission process with few extra addons : jobclient requests the jobtracker for a jobid [say $jobid] jobclient upload job.xml, job.jar, job.split, job.splitmetainfo, version, libs, archives etc to the staging area i.e ~/.staging/$jobid jobclient now contructs a job-submission-token which contains job staging area location (for job start and restart) job-submission version (for client-master compatibility) some checksum info (will expand on this later) user-credentials (for now username) jobclient passes job-submission-token over the rpc to jobtracker jobtracker persists this info in mapred.system.dir jobtracker uses the user-credentials in the job-meta-info to read the job.xml and job.splitmetainfo. jobtracker checks for job staging checksum when the tasktracker asks for a task, a Task is passed which contains the location of job.split along with start-offset and length. upon restart the jobtracker reads the job-meta info and re-submits the job (where the checksum check is done again) once the job is done, the staging area is deleted Checksum: job.xml md5 : this prevents jobtracker/tasktrackers from using a changed jobconf across job-submission and restarts. job-staging-area modification time : this prevents jobtracker and tasktracker for running jobs for which the staging area has changed.
          Hide
          Owen O'Malley added a comment -

          .bq Why cant this be in the respective files as headers? Today we add the version info as the first line in the file.

          It would have to be in all of the files. (job conf, raw split, split metadata) It seems easier to have a single version. In particular, at some point we will change the job conf from xml to binary. That isn't easy to do without a version on the directory.

          .bq So you mean to say that we just persist jobid and job-staging location for restart/persistence?

          Yes. The rest of the information would need to come from the staging directories. We should probably md5 the jobconf and verify it when it is downloaded by the task trackers and on restart.

          I guess I should have listed two more disadvantages:

          • the JobTracker needs to be the user to read the files from the staging area
          • the user can mess with their jobs after they are submitted

          Other than changing the job conf, I can't see any security problems with them changing any of the files.

          Show
          Owen O'Malley added a comment - .bq Why cant this be in the respective files as headers? Today we add the version info as the first line in the file. It would have to be in all of the files. (job conf, raw split, split metadata) It seems easier to have a single version. In particular, at some point we will change the job conf from xml to binary. That isn't easy to do without a version on the directory. .bq So you mean to say that we just persist jobid and job-staging location for restart/persistence? Yes. The rest of the information would need to come from the staging directories. We should probably md5 the jobconf and verify it when it is downloaded by the task trackers and on restart. I guess I should have listed two more disadvantages: the JobTracker needs to be the user to read the files from the staging area the user can mess with their jobs after they are submitted Other than changing the job conf, I can't see any security problems with them changing any of the files.
          Hide
          Amar Kamat added a comment -

          _version that contains the storage version (1.0 to start with)

          Why cant this be in the respective files as headers? Today we add the version info as the first line in the file.

          The JobTracker doesn't need to do any writes to HDFS, just reads

          So you mean to say that we just persist jobid and job-staging location for restart/persistence? Also the jobtracker will be forced do all the checks for job upon restart as the job files can change anytime. Also this is a change from the current model where the files once accepted cannot change. User now can change the jobconf while the job is running.

          Show
          Amar Kamat added a comment - _version that contains the storage version (1.0 to start with) Why cant this be in the respective files as headers? Today we add the version info as the first line in the file. The JobTracker doesn't need to do any writes to HDFS, just reads So you mean to say that we just persist jobid and job-staging location for restart/persistence? Also the jobtracker will be forced do all the checks for job upon restart as the job files can change anytime. Also this is a change from the current model where the files once accepted cannot change. User now can change the jobconf while the job is running.
          Hide
          Owen O'Malley added a comment -

          Ok, Arun and I discussed this offline and came up with the following proposal.

          We put everything about the job into the job's staging area (~/.staging/$jobid)

          • job conf
          • the serialized bytes of the input splits
          • the meta data for the splits (offset of split serialization, number of bytes in split, list of locations for split) for each split
          • job jar

          One last file that we need is because this effectively becomes interface is:

          • _version that contains the storage version (1.0 to start with)

          The advantages are:

          • The JobTracker doesn't need to do any writes to HDFS, just reads
          • The space counts against the user's quota on their home directory
          • Small RPC message
          • The job definition isn't split in two different places

          The disadvantages are:

          • Need versioning (so that hadoop 1.0 clients will work with hadoop 1.1 JobTrackers)
          • The job tracker is reading xml written by user code (need to move to binary eventually)
          • The user can accidentally kill all of their jobs.
          Show
          Owen O'Malley added a comment - Ok, Arun and I discussed this offline and came up with the following proposal. We put everything about the job into the job's staging area (~/.staging/$jobid) job conf the serialized bytes of the input splits the meta data for the splits (offset of split serialization, number of bytes in split, list of locations for split) for each split job jar One last file that we need is because this effectively becomes interface is: _version that contains the storage version (1.0 to start with) The advantages are: The JobTracker doesn't need to do any writes to HDFS, just reads The space counts against the user's quota on their home directory Small RPC message The job definition isn't split in two different places The disadvantages are: Need versioning (so that hadoop 1.0 clients will work with hadoop 1.1 JobTrackers) The job tracker is reading xml written by user code (need to move to binary eventually) The user can accidentally kill all of their jobs.
          Hide
          Doug Cutting added a comment -

          > I would be happier, if as part of JobSubmission, we moved the files from the user's staging area into the system dir.

          Why? It seems to me that we should minimize the JobTracker & TaskTracker's involvement, so that as much as possible happens in the task, written and read by the user.

          Show
          Doug Cutting added a comment - > I would be happier, if as part of JobSubmission, we moved the files from the user's staging area into the system dir. Why? It seems to me that we should minimize the JobTracker & TaskTracker's involvement, so that as much as possible happens in the task, written and read by the user.
          Hide
          Devaraj Das added a comment -

          Instead of storing the UGI with the submitted job, please store the user as a string. That will be forward compatible when we move to server-side groups. I think it makes sense to do as part of this patch, if it isn't already being done.

          The jobconf already has the username. Are you saying that the JT should maintain the mapping from the jobID to the username who was given this jobID (step 1 in the jobsubmission protocol), so that in the the following RPC the JT would be able to efficiently look up the username based on the jobID, rather than having to parse the conf to get it?

          The meta information should only include the offset, since the length is redundant with the following split's start.

          Hmm.. right.

          We use the binary format instead of xml to store the jobconf. However, when loading the binary format, we need to handle the final parameters.

          The conf is serialized using Configuration's write(DataOutput) that actually serializes everything out as strings. The JobTracker then writes the read configuration in the mapred.system.dir using Configuration.writeXml. The JobInProgress constructor loads the conf in the normal way (in the way it happens today). So final parameters defined in the JobTracker will be taken care of in the usual way.

          I'm not very happy with half of the job information being saved in the system directory and half of it in the staging directory. I assume that the staging directory is required to be on the same file system as the system directory? Having the job's definition split into two directories with two different owners seems bad. That is especially true since the data in the system directory will point to particular byte offsets in the staging directory. I think we will be in for some really nasty bugs involving

          The way I am seeing it is that the JobTracker is given only that piece of information that's required to launch the job. Things like job.jar, the split bytes, the distributed cache files, and anything else the users want to use in the job, are things required by the tasks which the JT doesn't care about. Every piece of information is generated by the client. If the client had generated the wrong information about the byte offsets, only his job gets affected.
          Your sentence about the "nasty bugs" is incomplete..

          I assume the cleanup of the staging directory is done by the JobTracker.

          Done as part of the job cleanup task.

          I guess I would be happier, if as part of JobSubmission, we moved the files from the user's staging area into the system dir. The JobTracker would read (possibly with a cache) the bytes for the task and send them to the user as part of the task definition.

          The split bytes file has a high replication factor of 10 (and it could be something like what Doug suggested). So do we really want the JT to copy the bytes to the system dir. I am trying to weigh the options of letting the tasks read the split bytes from the split file directly versus the JT passing the same in the task definition. The former reduces load on the JT (it doesn't have to load the split bytes in memory at all).

          Show
          Devaraj Das added a comment - Instead of storing the UGI with the submitted job, please store the user as a string. That will be forward compatible when we move to server-side groups. I think it makes sense to do as part of this patch, if it isn't already being done. The jobconf already has the username. Are you saying that the JT should maintain the mapping from the jobID to the username who was given this jobID (step 1 in the jobsubmission protocol), so that in the the following RPC the JT would be able to efficiently look up the username based on the jobID, rather than having to parse the conf to get it? The meta information should only include the offset, since the length is redundant with the following split's start. Hmm.. right. We use the binary format instead of xml to store the jobconf. However, when loading the binary format, we need to handle the final parameters. The conf is serialized using Configuration's write(DataOutput) that actually serializes everything out as strings. The JobTracker then writes the read configuration in the mapred.system.dir using Configuration.writeXml. The JobInProgress constructor loads the conf in the normal way (in the way it happens today). So final parameters defined in the JobTracker will be taken care of in the usual way. I'm not very happy with half of the job information being saved in the system directory and half of it in the staging directory. I assume that the staging directory is required to be on the same file system as the system directory? Having the job's definition split into two directories with two different owners seems bad. That is especially true since the data in the system directory will point to particular byte offsets in the staging directory. I think we will be in for some really nasty bugs involving The way I am seeing it is that the JobTracker is given only that piece of information that's required to launch the job. Things like job.jar, the split bytes, the distributed cache files, and anything else the users want to use in the job, are things required by the tasks which the JT doesn't care about. Every piece of information is generated by the client. If the client had generated the wrong information about the byte offsets, only his job gets affected. Your sentence about the "nasty bugs" is incomplete.. I assume the cleanup of the staging directory is done by the JobTracker. Done as part of the job cleanup task. I guess I would be happier, if as part of JobSubmission, we moved the files from the user's staging area into the system dir. The JobTracker would read (possibly with a cache) the bytes for the task and send them to the user as part of the task definition. The split bytes file has a high replication factor of 10 (and it could be something like what Doug suggested). So do we really want the JT to copy the bytes to the system dir. I am trying to weigh the options of letting the tasks read the split bytes from the split file directly versus the JT passing the same in the task definition. The former reduces load on the JT (it doesn't have to load the split bytes in memory at all).
          Hide
          Owen O'Malley added a comment -

          Instead of storing the UGI with the submitted job, please store the user as a string. That will be forward compatible when we move to server-side groups. I think it makes sense to do as part of this patch, if it isn't already being done.

          The meta information should only include the offset, since the length is redundant with the following split's start.

          We use the binary format instead of xml to store the jobconf. However, when loading the binary format, we need to handle the final parameters.

          I'm not very happy with half of the job information being saved in the system directory and half of it in the staging directory. I assume that the staging directory is required to be on the same file system as the system directory? Having the job's definition split into two directories with two different owners seems bad. That is especially true since the data in the system directory will point to particular byte offsets in the staging directory. I think we will be in for some really nasty bugs involving

          I assume the cleanup of the staging directory is done by the JobTracker.

          I guess I would be happier, if as part of JobSubmission, we moved the files from the user's staging area into the system dir. The JobTracker would read (possibly with a cache) the bytes for the task and send them to the user as part of the task definition.

          Show
          Owen O'Malley added a comment - Instead of storing the UGI with the submitted job, please store the user as a string. That will be forward compatible when we move to server-side groups. I think it makes sense to do as part of this patch, if it isn't already being done. The meta information should only include the offset, since the length is redundant with the following split's start. We use the binary format instead of xml to store the jobconf. However, when loading the binary format, we need to handle the final parameters. I'm not very happy with half of the job information being saved in the system directory and half of it in the staging directory. I assume that the staging directory is required to be on the same file system as the system directory? Having the job's definition split into two directories with two different owners seems bad. That is especially true since the data in the system directory will point to particular byte offsets in the staging directory. I think we will be in for some really nasty bugs involving I assume the cleanup of the staging directory is done by the JobTracker. I guess I would be happier, if as part of JobSubmission, we moved the files from the user's staging area into the system dir. The JobTracker would read (possibly with a cache) the bytes for the task and send them to the user as part of the task definition.
          Hide
          Doug Cutting added a comment -

          > We could either increase the rep degree of the split file [ ... ]

          We already increase the replication to 10 for job files. This could be made proportional, like distcp input file lists, whose replication is increased to sqrt(#slots), to implement an efficient 2-step fanout.

          Show
          Doug Cutting added a comment - > We could either increase the rep degree of the split file [ ... ] We already increase the replication to 10 for job files. This could be made proportional, like distcp input file lists, whose replication is increased to sqrt(#slots), to implement an efficient 2-step fanout.
          Hide
          Hong Tang added a comment -

          Sorry I was not aware of this jira until Devaraj updated MAPREDUCE-841. One concern I have with regard to pulling user splits from HDFS. There will be a surge of uncoordinated load from the first wave of maps (and subsequently, 2nd, 3rd ... waves) trying to reading from HDFS the same file. We could either increase the rep degree of the split file (the number may need to be dependent on the size of the job). Or we may still let the job tracker to read the raw bytes of a split from local disk and pass them to the maps.

          Show
          Hong Tang added a comment - Sorry I was not aware of this jira until Devaraj updated MAPREDUCE-841 . One concern I have with regard to pulling user splits from HDFS. There will be a surge of uncoordinated load from the first wave of maps (and subsequently, 2nd, 3rd ... waves) trying to reading from HDFS the same file. We could either increase the rep degree of the split file (the number may need to be dependent on the size of the job). Or we may still let the job tracker to read the raw bytes of a split from local disk and pass them to the maps.
          Hide
          Devaraj Das added a comment -

          For now, let's keep it simple - don't implement the points to do with maintaining/cleaning-up jobID->userName mappings. This should be looked at, in a bigger picture, once we have the authentication implemented. Also, rather than time-based expiry I think it would be better to have limits on number of queued jobs per user and the max queued jobs overall.

          Show
          Devaraj Das added a comment - For now, let's keep it simple - don't implement the points to do with maintaining/cleaning-up jobID->userName mappings. This should be looked at, in a bigger picture, once we have the authentication implemented. Also, rather than time-based expiry I think it would be better to have limits on number of queued jobs per user and the max queued jobs overall.
          Hide
          Amar Kamat added a comment -

          Here is the final proposal :

          1. Here is how the handshake happens for job submission
            1. jobclient asks the jobtracker for a new jobid (jobtracker maintains a mapping from job-id to user-name [ugi]. This user is the owner of the job and will be allowed to submit the job)
            2. using the Input-split, the jobclient constructs a split meta-info for the jobtracker to be able to create the task->node locality cache.
                 job-split-meta-info :
                     - split-location (location of the actual split/raw-bytes)
                     - split class (used to reinstantiate the split object)
                     - split-info (array of individual split meta-info)
              
                 split-meta-info :
                     - locations (hostnames where this split is local)
                     - start offset (start in raw-bytes)
                     - length (total bytes in the corresponding raw-bytes)
                     - data-size : total data that will be processed in this split
                
            3. with this new id, the jobclient upload job.xml, job.split, job.jar and achives/libs to a staging area (/user/user-name/.staging/jobid/). job.xml is staged to support (jobtracker.getJobFile()) api.
            4. after the upload is done, the jobclient submits a job by passing job-id, job-conf and job-split-meta-info via rpc.
            5. jobtracker does the following things upon a submitjob request
              1. validate conf (includes queuecheck, acls checks etc along with user-name [conf.username and owner match]and ownership checks [called of getnewid() and submitjob()])
              2. serialize conf to mapred.system.dir/jobid/job.xml (for restarts)
              3. serialize split-meta-info to mapred.system.dir/jobid/job.split
              4. starts the job i.e create jobinprogress
            6. when a tt comes asking for a task, the jobtracker passes the split-metainfo (along with split-location and split-classname). Tasktracker uses this metainfo for reading the split raw-bytes.
            7. tasktracker now localizes the job.jar from /user/user-name/.staging/job-id/job.jar and then unjars it. This is done using the job-conf (having user-credentials)
            8. mapred.system.dir can now be 700 and only accessible to mapred daemons
            9. readFields() in jobconf caps the total characters in jobconf. This prevents users from passing huge job-confs. For now the limit is 3*1024*1024 chars
            10. job-split metainfo is also capped in readFields() to accept split meta-info < 10mb.
            11. since jobtracker.getNewJobId() maintains a mapping from jobid to username, the jobtracker needs to cleanup this mapping upon some timeout. One way to timeout is to use a thread which periodically cleans up this mapping.
            12. Upon job completion, jobcleanup code cleans up the staging folder i.e /user/user-name/.staging/job-id/.
            13. if the jobclient crashes or fails to submit job then the temp files /user/user-name/.staging/job-id/ are not deleted as this can be used for debugging purposes.
          1. Upon restart the mapred.system.dir can be completely trusted and hence no checking is done here.
          Show
          Amar Kamat added a comment - Here is the final proposal : Here is how the handshake happens for job submission jobclient asks the jobtracker for a new jobid (jobtracker maintains a mapping from job-id to user-name [ugi] . This user is the owner of the job and will be allowed to submit the job) using the Input-split, the jobclient constructs a split meta-info for the jobtracker to be able to create the task->node locality cache. job-split-meta-info : - split-location (location of the actual split/raw-bytes) - split class (used to reinstantiate the split object) - split-info (array of individual split meta-info) split-meta-info : - locations (hostnames where this split is local) - start offset (start in raw-bytes) - length (total bytes in the corresponding raw-bytes) - data-size : total data that will be processed in this split with this new id, the jobclient upload job.xml, job.split, job.jar and achives/libs to a staging area (/user/ user-name /.staging/ jobid /). job.xml is staged to support (jobtracker.getJobFile()) api. after the upload is done, the jobclient submits a job by passing job-id, job-conf and job-split-meta-info via rpc. jobtracker does the following things upon a submitjob request validate conf (includes queuecheck, acls checks etc along with user-name [conf.username and owner match] and ownership checks [called of getnewid() and submitjob()] ) serialize conf to mapred.system.dir/jobid/job.xml (for restarts) serialize split-meta-info to mapred.system.dir/jobid/job.split starts the job i.e create jobinprogress when a tt comes asking for a task, the jobtracker passes the split-metainfo (along with split-location and split-classname). Tasktracker uses this metainfo for reading the split raw-bytes. tasktracker now localizes the job.jar from /user/ user-name /.staging/ job-id /job.jar and then unjars it. This is done using the job-conf (having user-credentials) mapred.system.dir can now be 700 and only accessible to mapred daemons readFields() in jobconf caps the total characters in jobconf. This prevents users from passing huge job-confs. For now the limit is 3*1024*1024 chars job-split metainfo is also capped in readFields() to accept split meta-info < 10mb. since jobtracker.getNewJobId() maintains a mapping from jobid to username, the jobtracker needs to cleanup this mapping upon some timeout. One way to timeout is to use a thread which periodically cleans up this mapping. Upon job completion, jobcleanup code cleans up the staging folder i.e /user/ user-name /.staging/ job-id /. if the jobclient crashes or fails to submit job then the temp files /user/ user-name /.staging/ job-id / are not deleted as this can be used for debugging purposes. Upon restart the mapred.system.dir can be completely trusted and hence no checking is done here.
          Hide
          Amar Kamat added a comment -

          Attaching a patch for review. Testing in progress.

          Show
          Amar Kamat added a comment - Attaching a patch for review. Testing in progress.
          Hide
          Devaraj Das added a comment -

          Some more details on the split file handling:
          1) The FileSystem used for writing the split bytes would be the same filesystem where mapred.system.dir is located.
          2) The split info (actual split bytes) would get written to the user's home directory on that filesystem (e.g., /user/<user-name>/.mapreduce/jobid)
          3) The split info can be cleaned up by the cleanup task of the job.
          For now, let's postpone the special handling for the JobConf, and instead put a cap on the max size (like 1 MB).

          Show
          Devaraj Das added a comment - Some more details on the split file handling: 1) The FileSystem used for writing the split bytes would be the same filesystem where mapred.system.dir is located. 2) The split info (actual split bytes) would get written to the user's home directory on that filesystem (e.g., /user/<user-name>/.mapreduce/jobid) 3) The split info can be cleaned up by the cleanup task of the job. For now, let's postpone the special handling for the JobConf, and instead put a cap on the max size (like 1 MB).
          Hide
          Devaraj Das added a comment -

          I wonder whether it makes sense to have the jobclient write two files per a split file:

          1) the splits info (the actual bytes) written to a secure location on the hdfs (with permissions 700)
          2) the split metadata, which is a set of entries like

          {<map-id>:<location_1><location_2>..<location_n>, <start-offset-in-split-file><length>}

          for each map-id. This is serialized over RPC, and the JobTracker writes it to the well known mapred-system-directory (which the JobTracker owns with perms 700).

          The JobTracker just reads/loads the metadata, and creates the TIP cache.

          The TaskTracker is handed off a split object that looks something like

          {<start-offset-in-split-file><length>}

          . As part of task localization, the TT copies the specific bytes from the split file (securely), and launches the task that then reads the split or the TT could simply stream it over RPC to the child. The replication factor could be set to a high number for the splits info file..

          Doing it in this way should reduce the size of the split file information considerably (and we can have a cap on the metadata size as well), and also provide security for the user generated split files' content.

          For the JobConf, passing the basic and the minimum info to the JobTracker as Hong suggested on MAPREDUCE-841 seems to make sense. For all other conf properties, the Task can load them directly from the HDFS. The max size (in terms of #bytes) of the basic information could be easily derived and we could have a cap on that for the RPC communication.

          Thoughts?

          Show
          Devaraj Das added a comment - I wonder whether it makes sense to have the jobclient write two files per a split file: 1) the splits info (the actual bytes) written to a secure location on the hdfs (with permissions 700) 2) the split metadata, which is a set of entries like {<map-id>:<location_1><location_2>..<location_n>, <start-offset-in-split-file><length>} for each map-id. This is serialized over RPC, and the JobTracker writes it to the well known mapred-system-directory (which the JobTracker owns with perms 700). The JobTracker just reads/loads the metadata, and creates the TIP cache. The TaskTracker is handed off a split object that looks something like {<start-offset-in-split-file><length>} . As part of task localization, the TT copies the specific bytes from the split file (securely), and launches the task that then reads the split or the TT could simply stream it over RPC to the child. The replication factor could be set to a high number for the splits info file.. Doing it in this way should reduce the size of the split file information considerably (and we can have a cap on the metadata size as well), and also provide security for the user generated split files' content. For the JobConf, passing the basic and the minimum info to the JobTracker as Hong suggested on MAPREDUCE-841 seems to make sense. For all other conf properties, the Task can load them directly from the HDFS. The max size (in terms of #bytes) of the basic information could be easily derived and we could have a cap on that for the RPC communication. Thoughts?
          Hide
          Amar Kamat added a comment -

          MAPREDUCE-807 is one more reason why we should close mapred.system.dir

          Show
          Amar Kamat added a comment - MAPREDUCE-807 is one more reason why we should close mapred.system.dir
          Hide
          Amar Kamat added a comment -

          Attaching a new patch [hadoop-3578-branch-20-example-2.patch] with no changes to the testcase. This patch is manually tested. This patch assumes [hadoop-3578-branch-20-example.patch] and should not be committed to branch 0.20.

          Show
          Amar Kamat added a comment - Attaching a new patch [hadoop-3578-branch-20-example-2.patch] with no changes to the testcase. This patch is manually tested. This patch assumes [hadoop-3578-branch-20-example.patch] and should not be committed to branch 0.20.
          Hide
          Amar Kamat added a comment -

          patch2 assumes that patch1 is applied.

          Show
          Amar Kamat added a comment - patch2 assumes that patch1 is applied.
          Hide
          Amar Kamat added a comment -

          Attaching a patch for branch-0.20 with some bug fixes.

          This is an example patch not to be committed.

          Show
          Amar Kamat added a comment - Attaching a patch for branch-0.20 with some bug fixes. This is an example patch not to be committed.
          Hide
          Amar Kamat added a comment -

          Attaching a patch for branch-0.20 with some bug fixes.

          Show
          Amar Kamat added a comment - Attaching a patch for branch-0.20 with some bug fixes.
          Hide
          Amar Kamat added a comment -

          Attaching a sample patch for branch 0.20 not to be committed.

          Show
          Amar Kamat added a comment - Attaching a sample patch for branch 0.20 not to be committed.
          Hide
          Amar Kamat added a comment -

          Attaching a patch that fixes the test case. Result of test-patch

           
          [exec] +1 overall.  
               [exec] 
               [exec]     +1 @author.  The patch does not contain any @author tags.
               [exec] 
               [exec]     +1 tests included.  The patch appears to include 15 new or modified tests.
               [exec] 
               [exec]     +1 javadoc.  The javadoc tool did not generate any warning messages.
               [exec] 
               [exec]     +1 javac.  The applied patch does not increase the total number of javac compiler warnings.
               [exec] 
               [exec]     +1 findbugs.  The patch does not introduce any new Findbugs warnings.
               [exec] 
               [exec]     +1 Eclipse classpath. The patch retains Eclipse classpath integrity.
               [exec] 
               [exec]     +1 release audit.  The applied patch does not increase the total number of release audit warnings.
          

          Ant tests passed on my box,

          Show
          Amar Kamat added a comment - Attaching a patch that fixes the test case. Result of test-patch [exec] +1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 15 new or modified tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] [exec] +1 Eclipse classpath. The patch retains Eclipse classpath integrity. [exec] [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings. Ant tests passed on my box,
          Hide
          Amar Kamat added a comment -

          Tested this patch on a 200 node cluster with sleepjob and the job ran fine.

          Show
          Amar Kamat added a comment - Tested this patch on a 200 node cluster with sleepjob and the job ran fine.
          Hide
          Kan Zhang added a comment -

          Thanks. I had a misconception about DistributedCache.

          Show
          Kan Zhang added a comment - Thanks. I had a misconception about DistributedCache.
          Hide
          Amar Kamat added a comment -

          Kan,
          This patch uploads job.jar to the staging area i.e ~/.staging/jobid/job.jar and creates a symlink in DistributedCache. With this patch job.jar will be treated similar to libjars.

          Show
          Amar Kamat added a comment - Kan, This patch uploads job.jar to the staging area i.e ~/.staging/jobid/job.jar and creates a symlink in DistributedCache. With this patch job.jar will be treated similar to libjars.
          Hide
          Kan Zhang added a comment -

          Amar, just to clarify, in your current patch, you are uploading job.jar to DistributedCache, not the staging dir in user's home dir (~/.staging/jobid/), right? Which means the mapreduce framework doesn't need user's credentials to access and localize job.jar for tasks. Am I right?

          Show
          Kan Zhang added a comment - Amar, just to clarify, in your current patch, you are uploading job.jar to DistributedCache, not the staging dir in user's home dir (~/.staging/jobid/), right? Which means the mapreduce framework doesn't need user's credentials to access and localize job.jar for tasks. Am I right?
          Hide
          Amar Kamat added a comment -

          Attaching a patch. I am currently testing the patch.

          Show
          Amar Kamat added a comment - Attaching a patch. I am currently testing the patch.
          Hide
          Owen O'Malley added a comment -

          I think the direction is right.

          In terms of your questions:
          1. I'd put an upper bound of 5mb on the job conf.
          2. We probably should save each of the input split ranges in a separate file, until we have append working right.
          3. If they haven't finished their job submission in 1 hour, I'd remove it.
          4. I wouldn't worry about this case. It is unlikely an authorized user will DDOS the job tracker. If they want to there are more interesting approaches.

          Show
          Owen O'Malley added a comment - I think the direction is right. In terms of your questions: 1. I'd put an upper bound of 5mb on the job conf. 2. We probably should save each of the input split ranges in a separate file, until we have append working right. 3. If they haven't finished their job submission in 1 hour, I'd remove it. 4. I wouldn't worry about this case. It is unlikely an authorized user will DDOS the job tracker. If they want to there are more interesting approaches.
          Hide
          Amar Kamat added a comment -

          the jobclient also uploads the job.jar to the DistributedCache and creates a symlink to it

          To be more precise, the jobclient copies the data from the job.jar's filesystem to the HDFS and the creates the symlink to this location. As of today the files are copied to system-dir/jobid/ but now will be copied to ~/.staging/jobid/

          Questions :

          1. How and when to cleanup the staging area?
          Show
          Amar Kamat added a comment - the jobclient also uploads the job.jar to the DistributedCache and creates a symlink to it To be more precise, the jobclient copies the data from the job.jar's filesystem to the HDFS and the creates the symlink to this location. As of today the files are copied to system-dir/jobid/ but now will be copied to ~/.staging/jobid/ Questions : How and when to cleanup the staging area?
          Hide
          Amar Kamat added a comment - - edited

          Some more details

          1. The jobclient requests the jobtracker for a new job id
          2. Along with the libs/archives, the jobclient also uploads the job.jar to the DistributedCache and creates a symlink to it (here the TaskRunner will localize the jars). With HADOOP-4490 (and security in distributed cache), the taskrunner will run under the user permission and hence will be able to securely localize the job jar
          3. The jobclient now starts the transaction with the jobtracker by passing the jobconf to the jobtracker. We expect the jobconf is be lightweight and hence pass it completely over the rpc.
            1. If the job (jobconf) fails the checks (acls etc) at the jobtracker, this job is ignored
            2. The jobtracker now maintains the jobid to user mapping for this job. This is done to make sure that only the user who owns the job can upload/add the splits
            3. finally the jt localizes the job to system-dir/jobid/job.xml so that the tasks are able to load the conf.
          4. The jobclient now uploads the job splits (in chunks of 1000 splits) to the jobtracker
            1. The jobtracker will check if the user is the owner of the job
            2. The jobtracker will maintain a mapping from jobid to the (split) file handle for that job
            3. This split file is opened as system-dir/jobid/job.split
            4. The jobtracker will stream all the splits passed by the client to this file
          5. The jobclient now finishes the transaction by invoking submitJob().
            1. The jobtracker will first close the open file handle for the jobsplit
            2. jt will cleanup the structures maintained for the transaction
            3. do what is done today upon a submit job (note that by now job.split and job.jar are both present in the system dir)

          Questions :

          1. What if the jobconf is of large size? Do we need to page it too?
          2. How many files(job-split) to support in parallel (as number of open file handles can lead to issues)?
            1. One way to do it would be to cap it 200 uploads in parallel
          3. How to take care of dead jobclients?
            1. Start a expiry thread that will cleanup dead/hung job submissions (every 5 mins)
          4. How to prevent the jobclients from passing more splits (say 1,00,000 splits) in one rpc call?
            1. Looks like this should be capped at the rpc level. I am not sure if there is any provision for something like this. For now we can leave it as it as.

          Thoughts?

          Show
          Amar Kamat added a comment - - edited Some more details The jobclient requests the jobtracker for a new job id Along with the libs/archives, the jobclient also uploads the job.jar to the DistributedCache and creates a symlink to it (here the TaskRunner will localize the jars). With HADOOP-4490 (and security in distributed cache), the taskrunner will run under the user permission and hence will be able to securely localize the job jar The jobclient now starts the transaction with the jobtracker by passing the jobconf to the jobtracker. We expect the jobconf is be lightweight and hence pass it completely over the rpc. If the job (jobconf) fails the checks (acls etc) at the jobtracker, this job is ignored The jobtracker now maintains the jobid to user mapping for this job. This is done to make sure that only the user who owns the job can upload/add the splits finally the jt localizes the job to system-dir/jobid/job.xml so that the tasks are able to load the conf. The jobclient now uploads the job splits (in chunks of 1000 splits) to the jobtracker The jobtracker will check if the user is the owner of the job The jobtracker will maintain a mapping from jobid to the (split) file handle for that job This split file is opened as system-dir/jobid/job.split The jobtracker will stream all the splits passed by the client to this file The jobclient now finishes the transaction by invoking submitJob(). The jobtracker will first close the open file handle for the jobsplit jt will cleanup the structures maintained for the transaction do what is done today upon a submit job (note that by now job.split and job.jar are both present in the system dir) Questions : What if the jobconf is of large size? Do we need to page it too? How many files(job-split) to support in parallel (as number of open file handles can lead to issues)? One way to do it would be to cap it 200 uploads in parallel How to take care of dead jobclients? Start a expiry thread that will cleanup dead/hung job submissions (every 5 mins) How to prevent the jobclients from passing more splits (say 1,00,000 splits) in one rpc call? Looks like this should be capped at the rpc level. I am not sure if there is any provision for something like this. For now we can leave it as it as. Thoughts?
          Hide
          Amar Kamat added a comment -

          don't see any compelling reason for this not to be done via rpc.

          What about job.jar? Should it be passed over rpc too? Is it safe?

          Show
          Amar Kamat added a comment - don't see any compelling reason for this not to be done via rpc. What about job.jar? Should it be passed over rpc too? Is it safe?
          Hide
          Owen O'Malley added a comment -

          1. I really don't like counting on generating "unknown" random numbers. This has often lead to security problems in practice as someone figures out a way to guess the numbers.

          2. I don't see any compelling reason for this not to be done via rpc. It will make the security story much much easier if only the mapred user can access the system directory. The clients would do:
          1. Get a jobid from the job tracker. (probably should pass the queue name here too, so that acls can be checked)
          2. Generate the splits.
          3. Pass the splits as RawInputSplits to the job tracker 1000 at a time.
          4. Pass the JobConf via rpc and tell the job tracker to start the job.
          The JobTracker can add them to the split file as they are received.

          Thoughts?

          Show
          Owen O'Malley added a comment - 1. I really don't like counting on generating "unknown" random numbers. This has often lead to security problems in practice as someone figures out a way to guess the numbers. 2. I don't see any compelling reason for this not to be done via rpc. It will make the security story much much easier if only the mapred user can access the system directory. The clients would do: 1. Get a jobid from the job tracker. (probably should pass the queue name here too, so that acls can be checked) 2. Generate the splits. 3. Pass the splits as RawInputSplits to the job tracker 1000 at a time. 4. Pass the JobConf via rpc and tell the job tracker to start the job. The JobTracker can add them to the split file as they are received. Thoughts?
          Hide
          Amar Kamat added a comment -

          Here is the proposal :

          Terms :

          1. mapred.system.dir : the common location where the users (jobclient) uploads job files (job split and job jars). This dir will have rwx-w-w permissions.
          2. mapred.system.dir/jobtracker : jobtracker's private scratch space with rwx------ permissions. This is the place where the jobtracker moves files upon successful job submission (upload + validation).

          The process of job submission is as follows

          1. jobclient/user asks jobtracker for a new jobid
          2. jobclient generates a new x digit random number and upload the job files (split and jar) to mapred.system.dir/jobid-random-number
          3. jobclient/user pass this information and the jobconf to the jobtracker via the rpc (submitJob api).
          4. jobtracker loads the conf via the rpc, does the acls check and only then the job is accepted (moved to mapred.system.dir/jobtracker)
          5. jobtracker serializes the job.xml (changing the location of split and jar file info in the conf) to mapred.system.dir/jobtracker/jobid, moves job.jar and job.split to mapred.system.dir/jobtracker/jobid (this is imp for tasktracker rely on the information in the conf for job.jar and job.split).
          6. Upon restart all the jobs that are present in mapred.system.dir/jobtracker/ will be blindly loaded and jobs in mapred.system.dir/ will be queued for cleanup.

          Benefits :

          1. guessing job-dir will be hard as random number will be appended
          2. separation between faulty jobs (jobs failing on access etc) and accepted jobs will be clear (helps in recovery)
          3. jobtracker system dir will be clean and cannot be garbled
          4. jobconf need not be read from fs as it wil be passed via rpc, this helps in making quick decisions whether the job is faulty or not
          5. re-initing jobtracker is as simple as deleting jobtracker's system.dir (mapred.system.dir/jobtracker) without touching the mapred.system.dir

          Questions :

          1. Should default api assume that the job.xml, job.jar and job.xml are still present in mapred.system.dir/jobid?

          Thoughts? Comments?

          Show
          Amar Kamat added a comment - Here is the proposal : Terms : mapred.system.dir : the common location where the users (jobclient) uploads job files (job split and job jars). This dir will have rwx-w- w permissions. mapred.system.dir/jobtracker : jobtracker's private scratch space with rwx------ permissions. This is the place where the jobtracker moves files upon successful job submission (upload + validation). The process of job submission is as follows jobclient/user asks jobtracker for a new jobid jobclient generates a new x digit random number and upload the job files (split and jar) to mapred.system.dir/jobid-random-number jobclient/user pass this information and the jobconf to the jobtracker via the rpc (submitJob api). jobtracker loads the conf via the rpc, does the acls check and only then the job is accepted (moved to mapred.system.dir/jobtracker) jobtracker serializes the job.xml (changing the location of split and jar file info in the conf) to mapred.system.dir/jobtracker/jobid, moves job.jar and job.split to mapred.system.dir/jobtracker/jobid (this is imp for tasktracker rely on the information in the conf for job.jar and job.split). Upon restart all the jobs that are present in mapred.system.dir/jobtracker/ will be blindly loaded and jobs in mapred.system.dir/ will be queued for cleanup. Benefits : guessing job-dir will be hard as random number will be appended separation between faulty jobs (jobs failing on access etc) and accepted jobs will be clear (helps in recovery) jobtracker system dir will be clean and cannot be garbled jobconf need not be read from fs as it wil be passed via rpc, this helps in making quick decisions whether the job is faulty or not re-initing jobtracker is as simple as deleting jobtracker's system.dir (mapred.system.dir/jobtracker) without touching the mapred.system.dir Questions : Should default api assume that the job.xml, job.jar and job.xml are still present in mapred.system.dir/jobid? Thoughts? Comments?
          Hide
          Doug Cutting added a comment -

          I don't see any advantages to Owen's proposal over mine. Generating a 64-bit hex name, writing a file in the jt's rwx-w---- submit directory, then passing the name of that file to the jt seems simpler to me, secure, and doesn't pollute the user's directory.

          Show
          Doug Cutting added a comment - I don't see any advantages to Owen's proposal over mine. Generating a 64-bit hex name, writing a file in the jt's rwx-w---- submit directory, then passing the name of that file to the jt seems simpler to me, secure, and doesn't pollute the user's directory.
          Hide
          Devaraj Das added a comment -

          The only potential issue with Owen's approach is in the case when the JT is lagging behind in the move of user's jobdir to under the submit dir (maybe jobs are getting submitted at a high rate), and the JT crashes, the users have to clean up the respective jobdirs. In the case where the users write the job stuff to a directory known to the JobTracker, this issue won't arise.. But granted this issue may not happen that frequently..

          Show
          Devaraj Das added a comment - The only potential issue with Owen's approach is in the case when the JT is lagging behind in the move of user's jobdir to under the submit dir (maybe jobs are getting submitted at a high rate), and the JT crashes, the users have to clean up the respective jobdirs. In the case where the users write the job stuff to a directory known to the JobTracker, this issue won't arise.. But granted this issue may not happen that frequently..
          Hide
          Owen O'Malley added a comment -

          Ok, some more details...

          I'd suggest using:
          system dir: perm = rwxr-xr-x, owner = mapreduce
          job dir: perm = rwx------, owner = job owner

          the job client would create the jobdir in the staging directory, which is in the user's home directory on the file system with the system dir on it.

          When the job is submitted, we send the jobconf over rpc by making Configuration implement Writable. This will allow the job tracker to load the job conf without being a super user.

          Now the job tracker uses the credentials in the jobconf to move the directory under the system dir. This way, we get:

          • the job tracker is not a super user
          • users can not read the jobdir of other users
          • users do not have permissions to write into the system dir
          • the jobdir is written only once by the jobclient
          • it is not a big change to the current job tracker / job client

          thoughts?

          Show
          Owen O'Malley added a comment - Ok, some more details... I'd suggest using: system dir: perm = rwxr-xr-x, owner = mapreduce job dir: perm = rwx------, owner = job owner the job client would create the jobdir in the staging directory, which is in the user's home directory on the file system with the system dir on it. When the job is submitted, we send the jobconf over rpc by making Configuration implement Writable. This will allow the job tracker to load the job conf without being a super user. Now the job tracker uses the credentials in the jobconf to move the directory under the system dir. This way, we get: the job tracker is not a super user users can not read the jobdir of other users users do not have permissions to write into the system dir the jobdir is written only once by the jobclient it is not a big change to the current job tracker / job client thoughts?
          Hide
          Owen O'Malley added a comment -

          I'd propose making the submit directory local to each user.

          ~/.submit/$jobid
          

          I think it would simplify things a lot, especially since the job tracker already has the user's hdfs credentials. Note that this should likely be in the system directory's file system...

          Show
          Owen O'Malley added a comment - I'd propose making the submit directory local to each user. ~/.submit/$jobid I think it would simplify things a lot, especially since the job tracker already has the user's hdfs credentials. Note that this should likely be in the system directory's file system...
          Hide
          Doug Cutting added a comment -

          Sure, that could work. Perhaps the jobclient can generate the random name itself, write it to the submit directory, then submit the job, providing the random name to the jobtracker then. The jobtracker would immediately move it, so that even the submitter could not alter it subsequently.

          Show
          Doug Cutting added a comment - Sure, that could work. Perhaps the jobclient can generate the random name itself, write it to the submit directory, then submit the job, providing the random name to the jobtracker then. The jobtracker would immediately move it, so that even the submitter could not alter it subsequently.
          Hide
          Amar Kamat added a comment -

          Consider the following approach
          1) The client submits to a mapred.submit.dir directory which has the rwx-w-w permission. The job-id that the JT creates contains some random component per job which would make guessing difficult.
          2) The JT moves the job (details) from the mapred.submit.dir to the mapred.system.dir which is now with rwx------ permission.
          This decreases the vulnerability to the window between job submission and job acceptance. Once the job is accepted by the jobtracker, it cant be tampered, even if the job name is known.

          Show
          Amar Kamat added a comment - Consider the following approach 1) The client submits to a mapred.submit.dir directory which has the rwx-w- w permission. The job-id that the JT creates contains some random component per job which would make guessing difficult. 2) The JT moves the job (details) from the mapred.submit.dir to the mapred.system.dir which is now with rwx------ permission. This decreases the vulnerability to the window between job submission and job acceptance. Once the job is accepted by the jobtracker, it cant be tampered, even if the job name is known.
          Hide
          Doug Cutting added a comment -

          > one could easily find out the job name by asking the jobtracker for a new job-id and replacing the last actual id with some number less than that, no?

          To make this work, job directories should not be named with the job id, but rather with a name that incorporates a random number. The job file name is already passed in the Task, so this should be a simple change.

          > I don't think dfs -rmr job_* will delete directories not owned by me, if there are no execute permissions on the parent.

          Right. Wildcard expansion is done in the client. If you cannot list a directory (execute permission) then you cannot expand wildcards in that directory.

          Show
          Doug Cutting added a comment - > one could easily find out the job name by asking the jobtracker for a new job-id and replacing the last actual id with some number less than that, no? To make this work, job directories should not be named with the job id, but rather with a name that incorporates a random number. The job file name is already passed in the Task, so this should be a simple change. > I don't think dfs -rmr job_* will delete directories not owned by me, if there are no execute permissions on the parent. Right. Wildcard expansion is done in the client. If you cannot list a directory (execute permission) then you cannot expand wildcards in that directory.
          Hide
          Hemanth Yamijala added a comment -

          The names of the job directories start with job_. hadoop dfs -rmr job_* would remove them, right ? I tried it on my directories, and wild card removal of directories seems to be working. So, I am assuming it will work even for the mapred system directories children.

          Sorry, I take that back. It probably worked because it is my directories. I don't think dfs -rmr job_* will delete directories not owned by me, if there are no execute permissions on the parent.

          Show
          Hemanth Yamijala added a comment - The names of the job directories start with job_. hadoop dfs -rmr job_* would remove them, right ? I tried it on my directories, and wild card removal of directories seems to be working. So, I am assuming it will work even for the mapred system directories children. Sorry, I take that back. It probably worked because it is my directories. I don't think dfs -rmr job_* will delete directories not owned by me, if there are no execute permissions on the parent.
          Hide
          Hemanth Yamijala added a comment -

          Only if their names are known. Since the directory cannot be listed except by owner, if random names are used, then others cannot remove them.

          The names of the job directories start with job_. hadoop dfs -rmr job_* would remove them, right ? I tried it on my directories, and wild card removal of directories seems to be working. So, I am assuming it will work even for the mapred system directories children.

          Show
          Hemanth Yamijala added a comment - Only if their names are known. Since the directory cannot be listed except by owner, if random names are used, then others cannot remove them. The names of the job directories start with job_. hadoop dfs -rmr job_* would remove them, right ? I tried it on my directories, and wild card removal of directories seems to be working. So, I am assuming it will work even for the mapred system directories children.
          Hide
          Amar Kamat added a comment -

          Doug,
          If the jobtracker is shared across users (which it will be) then one could easily find out the job name by asking the jobtracker for a new job-id and replacing the last actual id with some number less than that, no?

          Show
          Amar Kamat added a comment - Doug, If the jobtracker is shared across users (which it will be) then one could easily find out the job name by asking the jobtracker for a new job-id and replacing the last actual id with some number less than that, no?
          Hide
          Doug Cutting added a comment -

          > files could get deleted right [ ? ]

          Only if their names are known. Since the directory cannot be listed except by owner, if random names are used, then others cannot remove them.

          Show
          Doug Cutting added a comment - > files could get deleted right [ ? ] Only if their names are known. Since the directory cannot be listed except by owner, if random names are used, then others cannot remove them.
          Hide
          Devaraj Das added a comment -

          If the JobTracker is the only one writing to a private location then it can take care of this situation. For e.g., the JobTracker could create directories with a different name for each job (even from the same user).

          The problem with having the user-dir is that we need to make sure that over time garbage doesn't accumulate. If we put the onus on the user to clear the garbage, how does the user know for sure the jobtracker has copied the stuff over (this is one thing we need to worry about especially with restartability of jobtracker).

          To be absolutely sure that there are no security loopholes (for e.g. don't allow other users to even look at the job.xml of my job), the proposal of sending stuff over rpc makes sense. Of course, we need to fix other things like the webUI (authenticate the user before allowing him to view the job details) to make this a reality.

          Show
          Devaraj Das added a comment - If the JobTracker is the only one writing to a private location then it can take care of this situation. For e.g., the JobTracker could create directories with a different name for each job (even from the same user). The problem with having the user-dir is that we need to make sure that over time garbage doesn't accumulate. If we put the onus on the user to clear the garbage, how does the user know for sure the jobtracker has copied the stuff over (this is one thing we need to worry about especially with restartability of jobtracker). To be absolutely sure that there are no security loopholes (for e.g. don't allow other users to even look at the job.xml of my job), the proposal of sending stuff over rpc makes sense. Of course, we need to fix other things like the webUI (authenticate the user before allowing him to view the job details) to make this a reality.
          Hide
          Hemanth Yamijala added a comment -

          Doug, one of the issues in making directories writable is that files could get deleted right. So, it means one user's job directory created under mapred.system.dir could be deleted by another user, no ? If there is a concept like sticky bit in HDFS, then it would help to avoid this problem.

          Show
          Hemanth Yamijala added a comment - Doug, one of the issues in making directories writable is that files could get deleted right. So, it means one user's job directory created under mapred.system.dir could be deleted by another user, no ? If there is a concept like sticky bit in HDFS, then it would help to avoid this problem.
          Hide
          Doug Cutting added a comment -

          Would it work to set mapred.system.dir to rwx-w-w, so that applications besides the JobTracker could only write files? The file name to write could be returned over RPC from the JobTracker.

          Another option is to pass the data (job.xml) to the JobTracker over RPC, then have the JobTracker write it somewhere that only it can read. The job.jar could be handled similarly.

          Show
          Doug Cutting added a comment - Would it work to set mapred.system.dir to rwx-w- w , so that applications besides the JobTracker could only write files? The file name to write could be returned over RPC from the JobTracker. Another option is to pass the data (job.xml) to the JobTracker over RPC, then have the JobTracker write it somewhere that only it can read. The job.jar could be handled similarly.
          Hide
          Amar Kamat added a comment -

          One possible way to avoid this is to let JobClient pass the job folder under user-dir. JobTracker can now copy these files into mapred.system.dir.
          mapred.system.dir can now have rwx------ permission. Thoughts/Comments?

          Show
          Amar Kamat added a comment - One possible way to avoid this is to let JobClient pass the job folder under user-dir. JobTracker can now copy these files into mapred.system.dir . mapred.system.dir can now have rwx------ permission. Thoughts/Comments?

            People

            • Assignee:
              Devaraj Das
              Reporter:
              Amar Kamat
            • Votes:
              0 Vote for this issue
              Watchers:
              14 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development