Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-3722

Provide a unified way to pass jobconf options from bin/hadoop

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Minor
    • Resolution: Fixed
    • 0.19.0
    • 0.19.0
    • conf
    • None
    • Incompatible change, Reviewed
    • Hide
      Changed streaming StreamJob and Submitter to implement Tool and Configurable, and to use GenericOptionsParser arguments -fs, -jt, -conf, -D, -libjars, -files, and -archives. Deprecated -jobconf, -cacheArchive, -dfs, -cacheArchive, -additionalconfspec, from streaming and pipes in favor of the generic options. Removed from streaming -config, -mapred.job.tracker, and -cluster.
      Show
      Changed streaming StreamJob and Submitter to implement Tool and Configurable, and to use GenericOptionsParser arguments -fs, -jt, -conf, -D, -libjars, -files, and -archives. Deprecated -jobconf, -cacheArchive, -dfs, -cacheArchive, -additionalconfspec, from streaming and pipes in favor of the generic options. Removed from streaming -config, -mapred.job.tracker, and -cluster.

    Description

      Often when running a job it is useful to override some jobconf parameters from jobconf.xml for that particular job - for example, setting the job priority, setting the number of reduce tasks, setting the HDFS replication level, etc. Currently the Hadoop examples, streaming, pipes, etc take these extra jobconf parameters in different was: the examples in hadoop-examples.jar use -Dkey=value, streaming uses -jobconf key=value, and pipes uses -jobconf key1=value1,key2=value2,etc. Things would be simpler if bin/hadoop could take the jobconf parameters itself, so that you could run for example bin/hadoop -Dkey=value jar [whatever] as well as bin/hadoop -Dkey=value pipes [whatever]. This is especially useful when an organization needs to require users to use a particular property, e.g. the name of a queue to use for scheduling in HADOOP-3445. Otherwise, users may confuse one way of passing parameters with another and may not notice that they forgot to include certain properties.

      I propose adding support in bin/hadoop for jobconf options to be specified with -C key=value. This would have the effect of setting hadoop.jobconf.key=value in Java's system properties. The Configuration class would then be modified to read any system properties that begin with hadoop.jobconf and override the values in hadoop-site.xml.

      I can write a patch for this pretty quickly if the design is sound. If there's a better way of specifying jobconf parameters uniformly across Hadoop commands, let me know.

      Attachments

        1. HADOOP-3722.patch
          2 kB
          Matei Zaharia
        2. jobconfoptions_v1.patch
          47 kB
          Enis Soztutar
        3. jobconfoptions_v2.patch
          48 kB
          Enis Soztutar

        Issue Links

          Activity

            People

              enis Enis Soztutar
              matei@eecs.berkeley.edu Matei Zaharia
              Votes:
              0 Vote for this issue
              Watchers:
              11 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: