Hadoop Common
  1. Hadoop Common
  2. HADOOP-3722

Provide a unified way to pass jobconf options from bin/hadoop

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: 0.19.0
    • Fix Version/s: 0.19.0
    • Component/s: conf
    • Labels:
      None
    • Hadoop Flags:
      Incompatible change, Reviewed
    • Release Note:
      Hide
      Changed streaming StreamJob and Submitter to implement Tool and Configurable, and to use GenericOptionsParser arguments -fs, -jt, -conf, -D, -libjars, -files, and -archives. Deprecated -jobconf, -cacheArchive, -dfs, -cacheArchive, -additionalconfspec, from streaming and pipes in favor of the generic options. Removed from streaming -config, -mapred.job.tracker, and -cluster.
      Show
      Changed streaming StreamJob and Submitter to implement Tool and Configurable, and to use GenericOptionsParser arguments -fs, -jt, -conf, -D, -libjars, -files, and -archives. Deprecated -jobconf, -cacheArchive, -dfs, -cacheArchive, -additionalconfspec, from streaming and pipes in favor of the generic options. Removed from streaming -config, -mapred.job.tracker, and -cluster.

      Description

      Often when running a job it is useful to override some jobconf parameters from jobconf.xml for that particular job - for example, setting the job priority, setting the number of reduce tasks, setting the HDFS replication level, etc. Currently the Hadoop examples, streaming, pipes, etc take these extra jobconf parameters in different was: the examples in hadoop-examples.jar use -Dkey=value, streaming uses -jobconf key=value, and pipes uses -jobconf key1=value1,key2=value2,etc. Things would be simpler if bin/hadoop could take the jobconf parameters itself, so that you could run for example bin/hadoop -Dkey=value jar [whatever] as well as bin/hadoop -Dkey=value pipes [whatever]. This is especially useful when an organization needs to require users to use a particular property, e.g. the name of a queue to use for scheduling in HADOOP-3445. Otherwise, users may confuse one way of passing parameters with another and may not notice that they forgot to include certain properties.

      I propose adding support in bin/hadoop for jobconf options to be specified with -C key=value. This would have the effect of setting hadoop.jobconf.key=value in Java's system properties. The Configuration class would then be modified to read any system properties that begin with hadoop.jobconf and override the values in hadoop-site.xml.

      I can write a patch for this pretty quickly if the design is sound. If there's a better way of specifying jobconf parameters uniformly across Hadoop commands, let me know.

      1. HADOOP-3722.patch
        2 kB
        Matei Zaharia
      2. jobconfoptions_v1.patch
        47 kB
        Enis Soztutar
      3. jobconfoptions_v2.patch
        48 kB
        Enis Soztutar

        Issue Links

          Activity

          Hide
          Robert Chansler added a comment -

          This issue
          1. changed StreamJob(of streaming) and Submitter(of pipes) to implement Tool and Configurable. Streaming and submitter now accepts GenericOptionsParser arguments :
          -fs, -jt, -conf, -D, -libjars, -files, -archives

          2. Deprecated -jobconf, -cacheArchive, -dfs, -cacheArchive, -additionalconfspec, from streaming and pipes(where applicable) in favor of the generic options. The options still work issuing a warning as a side effect, however they may be later removed in the following releases.

          3. removed from streaming :
          -config : since it is not documented anywhere
          -mapred.job.tracker : it sets the wrong property, so it not used currently.
          -cluster : because setting -cluster gives "Unexpected -cluster while processing" error, so it is not used currently.

          Show
          Robert Chansler added a comment - This issue 1. changed StreamJob(of streaming) and Submitter(of pipes) to implement Tool and Configurable. Streaming and submitter now accepts GenericOptionsParser arguments : -fs, -jt, -conf, -D, -libjars, -files, -archives 2. Deprecated -jobconf, -cacheArchive, -dfs, -cacheArchive, -additionalconfspec, from streaming and pipes(where applicable) in favor of the generic options. The options still work issuing a warning as a side effect, however they may be later removed in the following releases. 3. removed from streaming : -config : since it is not documented anywhere -mapred.job.tracker : it sets the wrong property, so it not used currently. -cluster : because setting -cluster gives "Unexpected -cluster while processing" error, so it is not used currently.
          Hide
          Hudson added a comment -
          Show
          Hudson added a comment - Integrated in Hadoop-trunk #611 (See http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/611/ )
          Hide
          Enis Soztutar added a comment -

          Added a release note.

          Show
          Enis Soztutar added a comment - Added a release note.
          Hide
          Arun C Murthy added a comment -

          Enis, could you please add a detailed 'Release Note' for this jira? Thanks!

          Show
          Arun C Murthy added a comment - Enis, could you please add a detailed 'Release Note' for this jira? Thanks!
          Hide
          Enis Soztutar added a comment -

          The patch only deprecates parameters, issuing a warning, and introduces new ones. However in streaming, there were some parameters, like -cluster, which were not working so I just removed them.

          Show
          Enis Soztutar added a comment - The patch only deprecates parameters, issuing a warning, and introduces new ones. However in streaming, there were some parameters, like -cluster, which were not working so I just removed them.
          Hide
          dhruba borthakur added a comment -

          This appears to be an incompatible change. I am wondering if the older job-parameters-submitting -methods were deprecated (but still works with 0.19) or have they been removed completely?

          Show
          dhruba borthakur added a comment - This appears to be an incompatible change. I am wondering if the older job-parameters-submitting -methods were deprecated (but still works with 0.19) or have they been removed completely?
          Hide
          Arun C Murthy added a comment -

          I just committed this. Thanks, Enis!

          Show
          Arun C Murthy added a comment - I just committed this. Thanks, Enis!
          Hide
          Arun C Murthy added a comment -

          OTOH, I've changed my mind - I believe it's fine to commit this as-is and deal with the consequences later since this is an important cleanup.

          Show
          Arun C Murthy added a comment - OTOH, I've changed my mind - I believe it's fine to commit this as-is and deal with the consequences later since this is an important cleanup.
          Hide
          Arun C Murthy added a comment -

          +1, this is looking great!

          I'll get some 'expert' Streaming users to take a brief look and then go ahead and commit this.

          Show
          Arun C Murthy added a comment - +1, this is looking great! I'll get some 'expert' Streaming users to take a brief look and then go ahead and commit this.
          Hide
          Enis Soztutar added a comment -

          Failing test is not related to this patch.

          Show
          Enis Soztutar added a comment - Failing test is not related to this patch.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12389435/jobconfoptions_v2.patch
          against trunk revision 692409.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 6 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          -1 core tests. The patch failed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3189/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3189/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3189/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3189/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12389435/jobconfoptions_v2.patch against trunk revision 692409. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 6 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3189/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3189/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3189/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3189/console This message is automatically generated.
          Hide
          Enis Soztutar added a comment -

          Fixed findbugs warning.

          Show
          Enis Soztutar added a comment - Fixed findbugs warning.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12389176/jobconfoptions_v1.patch
          against trunk revision 690641.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 6 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          -1 findbugs. The patch appears to introduce 1 new Findbugs warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3149/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3149/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3149/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3149/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12389176/jobconfoptions_v1.patch against trunk revision 690641. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 6 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 1 new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3149/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3149/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3149/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3149/console This message is automatically generated.
          Hide
          Enis Soztutar added a comment -

          This patch

          1. changes StreamJob(of streaming) and Submitter(of pipes) to implement Tool and Configurable.
          2. deprecates -jobconf, -cacheArchive, -dfs,
          3. removes some never used parameters from streaming
          4. add a call to GenericOptionsParser#printGenericCommandUsage() in the StreamJob and Submitter's printUsage() methods.
            #updates pipes and streaming docs

          I will really appreciate if someone with real streaming / pipes usage can test this out.

          Show
          Enis Soztutar added a comment - This patch changes StreamJob(of streaming) and Submitter(of pipes) to implement Tool and Configurable. deprecates -jobconf, -cacheArchive, -dfs, removes some never used parameters from streaming add a call to GenericOptionsParser#printGenericCommandUsage() in the StreamJob and Submitter's printUsage() methods. #updates pipes and streaming docs I will really appreciate if someone with real streaming / pipes usage can test this out.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12385687/HADOOP-3722.patch
          against trunk revision 676069.

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no tests are needed for this patch.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2837/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2837/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2837/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2837/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12385687/HADOOP-3722.patch against trunk revision 676069. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no tests are needed for this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2837/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2837/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2837/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2837/console This message is automatically generated.
          Hide
          Chris Douglas added a comment -

          +1 for Enis's solution.

          That said, a solution like the one in Matei's patch might be a tolerable, short-term bridge between 0.17 and 0.18 for user code affected by HADOOP-3417 (discussion in HADOOP-3743).

          Show
          Chris Douglas added a comment - +1 for Enis's solution. That said, a solution like the one in Matei's patch might be a tolerable, short-term bridge between 0.17 and 0.18 for user code affected by HADOOP-3417 (discussion in HADOOP-3743 ).
          Hide
          Enis Soztutar added a comment -

          Ideally we should :

          1. change StreamJob(of streaming) and Submitter(of pipes) to implement Tool and Configurable.
          2. keep the configuration modifying codes in StreamJob and Submitter, but change them to display a deprecation warning about their use, in favor of -D name=value pairs.
          3. remove compatible -jt, -fs configurations from the StreamJob/Submitter, deprecate incompatible ones(for example -dfs)
          4. add a call to GenericOptionsParser#printGenericCommandUsage() in the StreamJob and Submitter's printUsage() methods.
          5. remove the -jobconf parameters at a later stage.
          Show
          Enis Soztutar added a comment - Ideally we should : change StreamJob(of streaming) and Submitter(of pipes) to implement Tool and Configurable. keep the configuration modifying codes in StreamJob and Submitter, but change them to display a deprecation warning about their use, in favor of -D name=value pairs. remove compatible -jt, -fs configurations from the StreamJob/Submitter, deprecate incompatible ones(for example -dfs) add a call to GenericOptionsParser#printGenericCommandUsage() in the StreamJob and Submitter's printUsage() methods. remove the -jobconf parameters at a later stage.
          Hide
          Arun C Murthy added a comment -

          I'm with Chris on this one, I don't think we need yet another way to pass config options along with -Dkey=value and -jobconf. Rather we need to standardize. So, it does make sense to pick one (-D or -jobconf) and stick with it. Yes, it means we will need to fix streaming/pipes or ToolRunner - we should.

          Show
          Arun C Murthy added a comment - I'm with Chris on this one, I don't think we need yet another way to pass config options along with -Dkey=value and -jobconf. Rather we need to standardize. So, it does make sense to pick one (-D or -jobconf) and stick with it. Yes, it means we will need to fix streaming/pipes or ToolRunner - we should.
          Hide
          Matei Zaharia added a comment -

          Regarding making streaming, pipes, etc use ToolRunner - I think that could be more complicated than it seems because you'd need to change the existing argument parsing in those libraries. People who have modified their streaming or pipes implementations would also have trouble (for example, we have a modified streaming at Facebook). Any new tool implementers can choose to use ToolRunner if they want, but this method lets you just write a simple Java class that calls submitJob and still be able to send parameters from bin/hadoop.

          Show
          Matei Zaharia added a comment - Regarding making streaming, pipes, etc use ToolRunner - I think that could be more complicated than it seems because you'd need to change the existing argument parsing in those libraries. People who have modified their streaming or pipes implementations would also have trouble (for example, we have a modified streaming at Facebook). Any new tool implementers can choose to use ToolRunner if they want, but this method lets you just write a simple Java class that calls submitJob and still be able to send parameters from bin/hadoop.
          Hide
          Chris Douglas added a comment -

          Good idea. Since the -D key=value syntax is managed by the Tool/ToolRunner, er, toolchain (see HADOOP-1425 and HADOOP-1436), it might make more sense to make streaming, pipes, etc. use that instead of pushing this functionality into the bash script and Java properties. Similarly, replacing the bash script with a Java launcher (per work in/related to HADOOP-3281, HADOOP-435) and using the aforementioned classes would also solve this issue, no?

          Show
          Chris Douglas added a comment - Good idea. Since the -D key=value syntax is managed by the Tool/ToolRunner, er, toolchain (see HADOOP-1425 and HADOOP-1436 ), it might make more sense to make streaming, pipes, etc. use that instead of pushing this functionality into the bash script and Java properties. Similarly, replacing the bash script with a Java launcher (per work in/related to HADOOP-3281 , HADOOP-435 ) and using the aforementioned classes would also solve this issue, no?
          Hide
          Matei Zaharia added a comment -

          Here's a patch that lets you use bin/hadoop -C property=value [command].

          Show
          Matei Zaharia added a comment - Here's a patch that lets you use bin/hadoop -C property=value [command] .

            People

            • Assignee:
              Enis Soztutar
              Reporter:
              Matei Zaharia
            • Votes:
              0 Vote for this issue
              Watchers:
              11 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development