Hadoop Map/Reduce
  1. Hadoop Map/Reduce
  2. MAPREDUCE-3112

Calling hadoop cli inside mapreduce job leads to errors

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.20.205.0, 0.23.0
    • Fix Version/s: 0.20.205.0, 0.23.0
    • Component/s: contrib/streaming
    • Labels:
      None
    • Environment:

      Java, Linux

    • Release Note:
      Hide
      Removed inheritance of certain server environment variables (HADOOP_OPTS and HADOOP_ROOT_LOGGER) in task attempt process.
      Show
      Removed inheritance of certain server environment variables (HADOOP_OPTS and HADOOP_ROOT_LOGGER) in task attempt process.

      Description

      When running a streaming job with mapper

      bin/hadoop --config /etc/hadoop/ jar contrib/streaming/hadoop-streaming-0.20.205.0.jar -mapper "hadoop --config /etc/hadoop/ dfs -help" -reducer NONE -input "/tmp/input.txt" -output NONE

      Task log shows:

      Exception in thread "main" java.lang.ExceptionInInitializerError
      	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:57)
      	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
      	at org.apache.hadoop.fs.FsShell.main(FsShell.java:1895)
      Caused by: org.apache.commons.logging.LogConfigurationException: User-specified log class 'org.apache.commons.logging.impl.Log4JLogger' cannot be found or is not useable.
      	at org.apache.commons.logging.impl.LogFactoryImpl.discoverLogImplementation(LogFactoryImpl.java:874)
      	at org.apache.commons.logging.impl.LogFactoryImpl.newInstance(LogFactoryImpl.java:604)
      	at org.apache.commons.logging.impl.LogFactoryImpl.getInstance(LogFactoryImpl.java:336)
      	at org.apache.commons.logging.impl.LogFactoryImpl.getInstance(LogFactoryImpl.java:310)
      	at org.apache.commons.logging.LogFactory.getLog(LogFactory.java:685)
      	at org.apache.hadoop.conf.Configuration.<clinit>(Configuration.java:142)
      	... 3 more
      java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1
      	at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:311)
      	at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:545)
      	at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:132)
      	at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57)
      	at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:36)
      	at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:436)
      	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
      	at org.apache.hadoop.mapred.Child$4.run(Child.java:261)
      	at java.security.AccessController.doPrivileged(Native Method)
      	at javax.security.auth.Subject.doAs(Subject.java:396)
      	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
      	at org.apache.hadoop.mapred.Child.main(Child.java:255)
      

      Upon inspection, there are two problems in the inherited from environment which prevent the logger initialization to work properly. In hadoop-env.sh, the HADOOP_OPTS is inherited from the parent process. This configuration was requested by user to have a way to override HADOOP environment in the configuration template:

      export HADOOP_OPTS="-Djava.net.preferIPv4Stack=true $HADOOP_OPTS"
      

      -Dhadoop.log.dir=$HADOOP_LOG_DIR/task_tracker_user is injected into HADOOP_OPTS in the tasktracker environment. Hence, the running task would inherit the wrong logging directory, which the end user might not have sufficient access to write. Second, $HADOOP_ROOT_LOGGER is override to: -Dhadoop.root.logger=INFO,TLA by the task controller, therefore, the bin/hadoop script will attempt to use hadoop.root.logger=INFO,TLA, but fail to initialize.

      1. MAPREDUCE-3112.patch
        1 kB
        Eric Yang
      2. MAPREDUCE-3112-trunk.patch
        2 kB
        Eric Yang
      3. HAPREDUCE-3112-1.patch
        0.7 kB
        Eric Yang
      4. MAPREDUCE-3112-trunk-2.patch
        2 kB
        Eric Yang

        Issue Links

          Activity

          Eric Yang created issue -
          Eric Yang made changes -
          Field Original Value New Value
          Attachment MAPREDUCE-3112.patch [ 12496892 ]
          Eric Yang made changes -
          Attachment MAPREDUCE-3112-trunk.patch [ 12496894 ]
          Eric Yang made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Release Note Removed inheritance of certain server environment variables (HADOOP_OPTS and HADOOP_ROOT_LOGGER) in task attempt process.
          Affects Version/s 0.23.0 [ 12315570 ]
          Fix Version/s 0.23.0 [ 12315570 ]
          Matt Foley made changes -
          Link This issue blocks MAPREDUCE-3080 [ MAPREDUCE-3080 ]
          Eric Yang made changes -
          Attachment HAPREDUCE-3112-1.patch [ 12496918 ]
          Eric Yang made changes -
          Attachment HAPREDUCE-3112-1.patch [ 12496921 ]
          Eric Yang made changes -
          Attachment HAPREDUCE-3112-1.patch [ 12496918 ]
          Eric Yang made changes -
          Attachment MAPREDUCE-3112-trunk-2.patch [ 12496924 ]
          Eric Yang made changes -
          Status Patch Available [ 10002 ] Resolved [ 5 ]
          Resolution Fixed [ 1 ]
          Matt Foley made changes -
          Status Resolved [ 5 ] Closed [ 6 ]

            People

            • Assignee:
              Eric Yang
              Reporter:
              Eric Yang
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development