Hadoop Map/Reduce
  1. Hadoop Map/Reduce
  2. MAPREDUCE-3112

Calling hadoop cli inside mapreduce job leads to errors

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.20.205.0, 0.23.0
    • Fix Version/s: 0.20.205.0, 0.23.0
    • Component/s: contrib/streaming
    • Labels:
      None
    • Environment:

      Java, Linux

    • Release Note:
      Hide
      Removed inheritance of certain server environment variables (HADOOP_OPTS and HADOOP_ROOT_LOGGER) in task attempt process.
      Show
      Removed inheritance of certain server environment variables (HADOOP_OPTS and HADOOP_ROOT_LOGGER) in task attempt process.

      Description

      When running a streaming job with mapper

      bin/hadoop --config /etc/hadoop/ jar contrib/streaming/hadoop-streaming-0.20.205.0.jar -mapper "hadoop --config /etc/hadoop/ dfs -help" -reducer NONE -input "/tmp/input.txt" -output NONE

      Task log shows:

      Exception in thread "main" java.lang.ExceptionInInitializerError
      	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:57)
      	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
      	at org.apache.hadoop.fs.FsShell.main(FsShell.java:1895)
      Caused by: org.apache.commons.logging.LogConfigurationException: User-specified log class 'org.apache.commons.logging.impl.Log4JLogger' cannot be found or is not useable.
      	at org.apache.commons.logging.impl.LogFactoryImpl.discoverLogImplementation(LogFactoryImpl.java:874)
      	at org.apache.commons.logging.impl.LogFactoryImpl.newInstance(LogFactoryImpl.java:604)
      	at org.apache.commons.logging.impl.LogFactoryImpl.getInstance(LogFactoryImpl.java:336)
      	at org.apache.commons.logging.impl.LogFactoryImpl.getInstance(LogFactoryImpl.java:310)
      	at org.apache.commons.logging.LogFactory.getLog(LogFactory.java:685)
      	at org.apache.hadoop.conf.Configuration.<clinit>(Configuration.java:142)
      	... 3 more
      java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1
      	at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:311)
      	at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:545)
      	at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:132)
      	at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57)
      	at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:36)
      	at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:436)
      	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
      	at org.apache.hadoop.mapred.Child$4.run(Child.java:261)
      	at java.security.AccessController.doPrivileged(Native Method)
      	at javax.security.auth.Subject.doAs(Subject.java:396)
      	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
      	at org.apache.hadoop.mapred.Child.main(Child.java:255)
      

      Upon inspection, there are two problems in the inherited from environment which prevent the logger initialization to work properly. In hadoop-env.sh, the HADOOP_OPTS is inherited from the parent process. This configuration was requested by user to have a way to override HADOOP environment in the configuration template:

      export HADOOP_OPTS="-Djava.net.preferIPv4Stack=true $HADOOP_OPTS"
      

      -Dhadoop.log.dir=$HADOOP_LOG_DIR/task_tracker_user is injected into HADOOP_OPTS in the tasktracker environment. Hence, the running task would inherit the wrong logging directory, which the end user might not have sufficient access to write. Second, $HADOOP_ROOT_LOGGER is override to: -Dhadoop.root.logger=INFO,TLA by the task controller, therefore, the bin/hadoop script will attempt to use hadoop.root.logger=INFO,TLA, but fail to initialize.

      1. MAPREDUCE-3112.patch
        1 kB
        Eric Yang
      2. MAPREDUCE-3112-trunk.patch
        2 kB
        Eric Yang
      3. HAPREDUCE-3112-1.patch
        0.7 kB
        Eric Yang
      4. MAPREDUCE-3112-trunk-2.patch
        2 kB
        Eric Yang

        Issue Links

          Activity

          Hide
          Matt Foley added a comment -

          If HADOOP_OPTS is viewed as a dictionary for sharing key/value pairs among Hadoop processes, then it seems that "hadoop.log.dir" should not be in HADOOP_OPTS. Either:

          • all processes can continue to use the name "hadoop.log.dir" for this parameter, but not share it through HADOOP_OPTS. Instead, sets of processes that need to share this value can share it through some other mechanism, perhaps a <PROCESS_SET_X>_SHARED_LOG parameter list, where each such process set have a differently named list; or
          • each set of processes that CAN share a value for log location should have its own name for the log location parameter, such as "hadoop.log.dir" and "tasktracker.log.dir". Then all (or none) of these parameters could be shared in HADOOP_OPTS.
          Show
          Matt Foley added a comment - If HADOOP_OPTS is viewed as a dictionary for sharing key/value pairs among Hadoop processes, then it seems that "hadoop.log.dir" should not be in HADOOP_OPTS. Either: all processes can continue to use the name "hadoop.log.dir" for this parameter, but not share it through HADOOP_OPTS. Instead, sets of processes that need to share this value can share it through some other mechanism, perhaps a <PROCESS_SET_X>_SHARED_LOG parameter list, where each such process set have a differently named list; or each set of processes that CAN share a value for log location should have its own name for the log location parameter, such as "hadoop.log.dir" and "tasktracker.log.dir". Then all (or none) of these parameters could be shared in HADOOP_OPTS.
          Hide
          Eric Yang added a comment -

          In previous release of HADOOP, we don't have this problem because we are always reconstructing HADOOP_OPTS from scratch in the invoking process. hadoop.log.dir is setup by the parent process to ensure the output are redirected properly to the desired location. This change was done as part of request from HCatalog to have ability to override the HADOOP_OPTS. HCatalog's request may be supported by changing HADOOP_OPTS overrides to HADOOP_USER_OPTS, and make HADOOP_USER_OPTS as the prefix of HADOOP_OPTS.

          In streaming job, we should unset HADOOP_ROOT_LOGGER environment variable to ensure hadoop command invoked in streaming job is output to console which gets redirected to TaskLogAppender by the task attempt.

          Show
          Eric Yang added a comment - In previous release of HADOOP, we don't have this problem because we are always reconstructing HADOOP_OPTS from scratch in the invoking process. hadoop.log.dir is setup by the parent process to ensure the output are redirected properly to the desired location. This change was done as part of request from HCatalog to have ability to override the HADOOP_OPTS. HCatalog's request may be supported by changing HADOOP_OPTS overrides to HADOOP_USER_OPTS, and make HADOOP_USER_OPTS as the prefix of HADOOP_OPTS. In streaming job, we should unset HADOOP_ROOT_LOGGER environment variable to ensure hadoop command invoked in streaming job is output to console which gets redirected to TaskLogAppender by the task attempt.
          Hide
          Eric Yang added a comment -

          Patch for brach-20-security.

          Show
          Eric Yang added a comment - Patch for brach-20-security.
          Hide
          Eric Yang added a comment -

          Same patch for trunk.

          Show
          Eric Yang added a comment - Same patch for trunk.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12496894/MAPREDUCE-3112-trunk.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no new tests are needed for this patch.
          Also please list what manual steps were performed to verify this patch.

          -1 patch. The patch command could not apply the patch.

          Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/880//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12496894/MAPREDUCE-3112-trunk.patch against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/880//console This message is automatically generated.
          Hide
          Eric Yang added a comment -

          There is no need of HADOOP_USER_OPTS because HADOOP_CLIENT_OPS already exists. The patch is modified to have client ops be part of HADOOP_OPTS in the template.

          Show
          Eric Yang added a comment - There is no need of HADOOP_USER_OPTS because HADOOP_CLIENT_OPS already exists. The patch is modified to have client ops be part of HADOOP_OPTS in the template.
          Hide
          Eric Yang added a comment -

          Make sure HADOOP_OPTS contains HADOOP_CLIENT_OPTS in case Hadoop command is executed inside streaming job. TaskLogAppender is streamed to the user log file.

          Show
          Eric Yang added a comment - Make sure HADOOP_OPTS contains HADOOP_CLIENT_OPTS in case Hadoop command is executed inside streaming job. TaskLogAppender is streamed to the user log file.
          Hide
          Arun C Murthy added a comment -

          +1

          Show
          Arun C Murthy added a comment - +1
          Hide
          Eric Yang added a comment -

          Updated configuration to have HADOOP_CLIENT_OPTS override.

          Show
          Eric Yang added a comment - Updated configuration to have HADOOP_CLIENT_OPTS override.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12496924/MAPREDUCE-3112-trunk-2.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no new tests are needed for this patch.
          Also please list what manual steps were performed to verify this patch.

          -1 patch. The patch command could not apply the patch.

          Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/882//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12496924/MAPREDUCE-3112-trunk-2.patch against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/882//console This message is automatically generated.
          Hide
          Ramya Sunil added a comment -

          +1 Tested the fix and was able to make dfs calls from inside mapreduce jobs.

          Show
          Ramya Sunil added a comment - +1 Tested the fix and was able to make dfs calls from inside mapreduce jobs.
          Hide
          Matt Foley added a comment -

          Committed to 0.20-security and 0.20.205.
          Thanks, Eric! And thanks Arun and Ramya for review and test.

          Show
          Matt Foley added a comment - Committed to 0.20-security and 0.20.205. Thanks, Eric! And thanks Arun and Ramya for review and test.
          Hide
          Eric Yang added a comment -

          I just committed this to 0.23 and trunk, thanks Rayma and Matt.

          Show
          Eric Yang added a comment - I just committed this to 0.23 and trunk, thanks Rayma and Matt.
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Common-trunk-Commit #1006 (See https://builds.apache.org/job/Hadoop-Common-trunk-Commit/1006/)
          MAPREDUCE-3112. Fixed recursive sourcing of HADOOP_OPTS environment
          variable. (Eric Yang)

          eyang : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1178657
          Files :

          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/packages/templates/conf/hadoop-env.sh
          • /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
          • /hadoop/common/trunk/hadoop-mapreduce-project/src/contrib/streaming/src/java/org/apache/hadoop/streaming/StreamJob.java
          Show
          Hudson added a comment - Integrated in Hadoop-Common-trunk-Commit #1006 (See https://builds.apache.org/job/Hadoop-Common-trunk-Commit/1006/ ) MAPREDUCE-3112 . Fixed recursive sourcing of HADOOP_OPTS environment variable. (Eric Yang) eyang : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1178657 Files : /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/packages/templates/conf/hadoop-env.sh /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/trunk/hadoop-mapreduce-project/src/contrib/streaming/src/java/org/apache/hadoop/streaming/StreamJob.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Hdfs-trunk-Commit #1084 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/1084/)
          MAPREDUCE-3112. Fixed recursive sourcing of HADOOP_OPTS environment
          variable. (Eric Yang)

          eyang : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1178657
          Files :

          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/packages/templates/conf/hadoop-env.sh
          • /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
          • /hadoop/common/trunk/hadoop-mapreduce-project/src/contrib/streaming/src/java/org/apache/hadoop/streaming/StreamJob.java
          Show
          Hudson added a comment - Integrated in Hadoop-Hdfs-trunk-Commit #1084 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/1084/ ) MAPREDUCE-3112 . Fixed recursive sourcing of HADOOP_OPTS environment variable. (Eric Yang) eyang : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1178657 Files : /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/packages/templates/conf/hadoop-env.sh /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/trunk/hadoop-mapreduce-project/src/contrib/streaming/src/java/org/apache/hadoop/streaming/StreamJob.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Mapreduce-trunk-Commit #1026 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/1026/)
          MAPREDUCE-3112. Fixed recursive sourcing of HADOOP_OPTS environment
          variable. (Eric Yang)

          eyang : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1178657
          Files :

          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/packages/templates/conf/hadoop-env.sh
          • /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
          • /hadoop/common/trunk/hadoop-mapreduce-project/src/contrib/streaming/src/java/org/apache/hadoop/streaming/StreamJob.java
          Show
          Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk-Commit #1026 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/1026/ ) MAPREDUCE-3112 . Fixed recursive sourcing of HADOOP_OPTS environment variable. (Eric Yang) eyang : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1178657 Files : /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/packages/templates/conf/hadoop-env.sh /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/trunk/hadoop-mapreduce-project/src/contrib/streaming/src/java/org/apache/hadoop/streaming/StreamJob.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Mapreduce-0.23-Build #36 (See https://builds.apache.org/job/Hadoop-Mapreduce-0.23-Build/36/)
          MAPREDUCE-3112. Fixed recursive sourcing of HADOOP_OPTS environment
          variable. (Eric Yang)

          eyang : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1178658
          Files :

          • /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/src/main/packages/templates/conf/hadoop-env.sh
          • /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt
          • /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/src/contrib/streaming/src/java/org/apache/hadoop/streaming/StreamJob.java
          Show
          Hudson added a comment - Integrated in Hadoop-Mapreduce-0.23-Build #36 (See https://builds.apache.org/job/Hadoop-Mapreduce-0.23-Build/36/ ) MAPREDUCE-3112 . Fixed recursive sourcing of HADOOP_OPTS environment variable. (Eric Yang) eyang : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1178658 Files : /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/src/main/packages/templates/conf/hadoop-env.sh /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/src/contrib/streaming/src/java/org/apache/hadoop/streaming/StreamJob.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Hdfs-trunk #820 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/820/)
          MAPREDUCE-3112. Fixed recursive sourcing of HADOOP_OPTS environment
          variable. (Eric Yang)

          eyang : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1178657
          Files :

          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/packages/templates/conf/hadoop-env.sh
          • /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
          • /hadoop/common/trunk/hadoop-mapreduce-project/src/contrib/streaming/src/java/org/apache/hadoop/streaming/StreamJob.java
          Show
          Hudson added a comment - Integrated in Hadoop-Hdfs-trunk #820 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/820/ ) MAPREDUCE-3112 . Fixed recursive sourcing of HADOOP_OPTS environment variable. (Eric Yang) eyang : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1178657 Files : /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/packages/templates/conf/hadoop-env.sh /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/trunk/hadoop-mapreduce-project/src/contrib/streaming/src/java/org/apache/hadoop/streaming/StreamJob.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Hdfs-0.23-Build #29 (See https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/29/)
          MAPREDUCE-3112. Fixed recursive sourcing of HADOOP_OPTS environment
          variable. (Eric Yang)

          eyang : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1178658
          Files :

          • /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/src/main/packages/templates/conf/hadoop-env.sh
          • /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt
          • /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/src/contrib/streaming/src/java/org/apache/hadoop/streaming/StreamJob.java
          Show
          Hudson added a comment - Integrated in Hadoop-Hdfs-0.23-Build #29 (See https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/29/ ) MAPREDUCE-3112 . Fixed recursive sourcing of HADOOP_OPTS environment variable. (Eric Yang) eyang : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1178658 Files : /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/src/main/packages/templates/conf/hadoop-env.sh /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/src/contrib/streaming/src/java/org/apache/hadoop/streaming/StreamJob.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Mapreduce-trunk #850 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/850/)
          MAPREDUCE-3112. Fixed recursive sourcing of HADOOP_OPTS environment
          variable. (Eric Yang)

          eyang : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1178657
          Files :

          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/packages/templates/conf/hadoop-env.sh
          • /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
          • /hadoop/common/trunk/hadoop-mapreduce-project/src/contrib/streaming/src/java/org/apache/hadoop/streaming/StreamJob.java
          Show
          Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk #850 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/850/ ) MAPREDUCE-3112 . Fixed recursive sourcing of HADOOP_OPTS environment variable. (Eric Yang) eyang : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1178657 Files : /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/packages/templates/conf/hadoop-env.sh /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/trunk/hadoop-mapreduce-project/src/contrib/streaming/src/java/org/apache/hadoop/streaming/StreamJob.java
          Hide
          Matt Foley added a comment -

          Closed upon release of 0.20.205.0

          Show
          Matt Foley added a comment - Closed upon release of 0.20.205.0

            People

            • Assignee:
              Eric Yang
              Reporter:
              Eric Yang
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development