Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
0.20.205.0, 0.23.0
-
None
-
Java, Linux
Description
When running a streaming job with mapper
bin/hadoop --config /etc/hadoop/ jar contrib/streaming/hadoop-streaming-0.20.205.0.jar -mapper "hadoop --config /etc/hadoop/ dfs -help" -reducer NONE -input "/tmp/input.txt" -output NONE
Task log shows:
Exception in thread "main" java.lang.ExceptionInInitializerError at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:57) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79) at org.apache.hadoop.fs.FsShell.main(FsShell.java:1895) Caused by: org.apache.commons.logging.LogConfigurationException: User-specified log class 'org.apache.commons.logging.impl.Log4JLogger' cannot be found or is not useable. at org.apache.commons.logging.impl.LogFactoryImpl.discoverLogImplementation(LogFactoryImpl.java:874) at org.apache.commons.logging.impl.LogFactoryImpl.newInstance(LogFactoryImpl.java:604) at org.apache.commons.logging.impl.LogFactoryImpl.getInstance(LogFactoryImpl.java:336) at org.apache.commons.logging.impl.LogFactoryImpl.getInstance(LogFactoryImpl.java:310) at org.apache.commons.logging.LogFactory.getLog(LogFactory.java:685) at org.apache.hadoop.conf.Configuration.<clinit>(Configuration.java:142) ... 3 more java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1 at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:311) at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:545) at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:132) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57) at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:36) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:436) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372) at org.apache.hadoop.mapred.Child$4.run(Child.java:261) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059) at org.apache.hadoop.mapred.Child.main(Child.java:255)
Upon inspection, there are two problems in the inherited from environment which prevent the logger initialization to work properly. In hadoop-env.sh, the HADOOP_OPTS is inherited from the parent process. This configuration was requested by user to have a way to override HADOOP environment in the configuration template:
export HADOOP_OPTS="-Djava.net.preferIPv4Stack=true $HADOOP_OPTS"
-Dhadoop.log.dir=$HADOOP_LOG_DIR/task_tracker_user is injected into HADOOP_OPTS in the tasktracker environment. Hence, the running task would inherit the wrong logging directory, which the end user might not have sufficient access to write. Second, $HADOOP_ROOT_LOGGER is override to: -Dhadoop.root.logger=INFO,TLA by the task controller, therefore, the bin/hadoop script will attempt to use hadoop.root.logger=INFO,TLA, but fail to initialize.
Attachments
Attachments
Issue Links
- blocks
-
MAPREDUCE-3080 dfs calls from streaming fails with ExceptionInInitializerError
- Open