Hadoop Map/Reduce
  1. Hadoop Map/Reduce
  2. MAPREDUCE-1697

Document the behavior of -file option in streaming and deprecate it in favour of generic -files option.

    Details

    • Hadoop Flags:
      Reviewed
    • Release Note:
      Documented the behavior of -file option in streaming and deprecated it in favor of generic -files option.

      Description

      The behavior of -file option in streaming is not documented anywhere.
      The behavior of -file is the following :
      1) All the files passed through -file option are packaged into job.jar.
      2) If -file option is used for .class or .jar files, they are unjarred on tasktracker and placed in $

      {mapred.local.dir}/taskTracker/jobcache/job_ID/jars/classes or /lib, respectively. Symlinks to the directories classes and lib are created from the cwd of the task, . The names of symlinks are "classes", "lib". So file names of .class or .jar files do not appear in cwd of the task.
      Paths to these files are automatically added to classpath. The tricky part is that hadoop framework can pick .class or .jar using classpath, but actual mapper script cannot. If you'd like to access these .class or .jar inside script, please do something like "java -cp lib/;classes/ <ClassName>".
      3) If -file option is used for files other than .class or .jar (e.g, .txt or .pl), these files are unjarred into ${mapred.local.dir}

      /taskTracker/jobcache/job_ID/jars/. Symlinks to these files are created from the cwd of the task. Names of these symlinks are actually file names.

      1. patch-1697.txt
        1 kB
        Amareshwari Sriramadasu
      2. patch-1697-1.txt
        1 kB
        Amareshwari Sriramadasu
      3. patch-1697-2.txt
        4 kB
        Amareshwari Sriramadasu
      4. patch-1697-3.txt
        4 kB
        Amareshwari Sriramadasu

        Issue Links

          Activity

          Hide
          Amareshwari Sriramadasu added a comment -

          Patch updating the documentation

          Show
          Amareshwari Sriramadasu added a comment - Patch updating the documentation
          Hide
          Amareshwari Sriramadasu added a comment -

          Ran ant docs both on trunk and branch 0.21, with the patch. It ran successfully.

          Show
          Amareshwari Sriramadasu added a comment - Ran ant docs both on trunk and branch 0.21, with the patch. It ran successfully.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12445045/patch-1697.txt
          against trunk revision 946578.

          +1 @author. The patch does not contain any @author tags.

          +0 tests included. The patch appears to be a documentation patch that doesn't require tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/540/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/540/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/540/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/540/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12445045/patch-1697.txt against trunk revision 946578. +1 @author. The patch does not contain any @author tags. +0 tests included. The patch appears to be a documentation patch that doesn't require tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/540/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/540/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/540/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/540/console This message is automatically generated.
          Hide
          Amareshwari Sriramadasu added a comment -

          Test failures are clearly not related to the patch. They failed because of NoClassDefFoundError (MAPREDUCE-1275)

          Show
          Amareshwari Sriramadasu added a comment - Test failures are clearly not related to the patch. They failed because of NoClassDefFoundError ( MAPREDUCE-1275 )
          Hide
          Amareshwari Sriramadasu added a comment -

          Did some editorial changes to the earlier patch, suggested by Ravi offline.

          Ran ant docs both in trunk and branch 0.21, it ran successfully.

          Show
          Amareshwari Sriramadasu added a comment - Did some editorial changes to the earlier patch, suggested by Ravi offline. Ran ant docs both in trunk and branch 0.21, it ran successfully.
          Hide
          Ravi Gummadi added a comment -

          Latest patch looks good.
          +1

          Show
          Ravi Gummadi added a comment - Latest patch looks good. +1
          Hide
          Ravi Gummadi added a comment -

          Missed mentioning earlier.
          In streaming.xml, under "Streaming Command Options", description about -file is as follows:

          -file filename	    Optional 	      Make the mapper, reducer, or combiner executable available locally on the compute nodes
          

          Should we change this to the description similar to that is there in StreamJob.exitUsage() instead of saying that -file option can take only mapper/reducer/combiner executables.

          Show
          Ravi Gummadi added a comment - Missed mentioning earlier. In streaming.xml, under "Streaming Command Options", description about -file is as follows: -file filename Optional Make the mapper, reducer, or combiner executable available locally on the compute nodes Should we change this to the description similar to that is there in StreamJob.exitUsage() instead of saying that -file option can take only mapper/reducer/combiner executables.
          Hide
          Amareshwari Sriramadasu added a comment -

          Streaming -info (StreamJob.exitUsage) says

          -file     <file>     File/dir to be shipped in the Job jar file
          

          When I tried passing a directory through -file option, the contents of directory are added to the job jar, not the directory itself.
          After MAPREDUCE-967, because the contents of the passed directory are not added to the jar unpack pattern, the files/dirs inside the passed directory are not unjarred. Thus they are not symlinked from cwd of the task. I raised MAPREDUCE-1826 for this. We can update "behavior of passing a directory through -file option" in MAPREDUCE-1826 itself.

          Along with documentation changes, I would like to deprecate the -file option in this jira in favor of MAPREDUCE-574.
          Thoughts?

          Show
          Amareshwari Sriramadasu added a comment - Streaming -info (StreamJob.exitUsage) says -file <file> File/dir to be shipped in the Job jar file When I tried passing a directory through -file option, the contents of directory are added to the job jar, not the directory itself. After MAPREDUCE-967 , because the contents of the passed directory are not added to the jar unpack pattern , the files/dirs inside the passed directory are not unjarred. Thus they are not symlinked from cwd of the task. I raised MAPREDUCE-1826 for this. We can update "behavior of passing a directory through -file option" in MAPREDUCE-1826 itself. Along with documentation changes, I would like to deprecate the -file option in this jira in favor of MAPREDUCE-574 . Thoughts?
          Hide
          Ravi Gummadi added a comment -

          +1 for just adding deprecation warning message and having the behavior of -file be the same as it is there now in this release. Then in next release(0.23), we can remove the option -file.

          Show
          Ravi Gummadi added a comment - +1 for just adding deprecation warning message and having the behavior of -file be the same as it is there now in this release. Then in next release(0.23), we can remove the option -file.
          Hide
          Amareshwari Sriramadasu added a comment -

          Patch deprecates -file option and updates usage message with proper behavior of -file option.

          Show
          Amareshwari Sriramadasu added a comment - Patch deprecates -file option and updates usage message with proper behavior of -file option.
          Hide
          Ravi Gummadi added a comment -

          Patch looks good.
          +1

          Show
          Ravi Gummadi added a comment - Patch looks good. +1
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12446011/patch-1697-2.txt
          against trunk revision 950021.

          +1 @author. The patch does not contain any @author tags.

          +0 tests included. The patch appears to be a documentation patch that doesn't require tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed core unit tests.

          -1 contrib tests. The patch failed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/216/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/216/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/216/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/216/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12446011/patch-1697-2.txt against trunk revision 950021. +1 @author. The patch does not contain any @author tags. +0 tests included. The patch appears to be a documentation patch that doesn't require tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/216/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/216/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/216/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/216/console This message is automatically generated.
          Hide
          Amareshwari Sriramadasu added a comment -

          -1 contrib tests.

          TestSimulatorDeterministicReplay failed because of timeout. The test timedout without the patch also, on my local machine. Raised MAPREDUCE-1834 for the same.

          Show
          Amareshwari Sriramadasu added a comment - -1 contrib tests. TestSimulatorDeterministicReplay failed because of timeout. The test timedout without the patch also, on my local machine. Raised MAPREDUCE-1834 for the same.
          Hide
          Amareshwari Sriramadasu added a comment -

          Ran tests in branch 0.21 also. All tests passed, except org.apache.hadoop.mapreduce.security.token.TestDelegationTokenRenewal (which passed on a re-ran) and TestSimulatorDeterministicReplay timeout (MAPREDUCE-1835 and MAPREDUCE-1834).

          Show
          Amareshwari Sriramadasu added a comment - Ran tests in branch 0.21 also. All tests passed, except org.apache.hadoop.mapreduce.security.token.TestDelegationTokenRenewal (which passed on a re-ran) and TestSimulatorDeterministicReplay timeout ( MAPREDUCE-1835 and MAPREDUCE-1834 ).
          Hide
          Amareshwari Sriramadasu added a comment -

          Patch with minor editorial changes to documentation, suggested by Vinod offline.

          Ran ant docs with the patch on both trunk and branch 0.21.

          Show
          Amareshwari Sriramadasu added a comment - Patch with minor editorial changes to documentation, suggested by Vinod offline. Ran ant docs with the patch on both trunk and branch 0.21.
          Hide
          Vinod Kumar Vavilapalli added a comment -

          +1 for the patch. I built the docs and verified how it looks too. I'm going to check this into trunk and 0.21.

          Show
          Vinod Kumar Vavilapalli added a comment - +1 for the patch. I built the docs and verified how it looks too. I'm going to check this into trunk and 0.21.
          Hide
          Vinod Kumar Vavilapalli added a comment -

          I just committed this to trunk and 0.21. Thanks Amareshwari!

          Show
          Vinod Kumar Vavilapalli added a comment - I just committed this to trunk and 0.21. Thanks Amareshwari!

            People

            • Assignee:
              Amareshwari Sriramadasu
              Reporter:
              Amareshwari Sriramadasu
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development