I was trying to run a java class on my client, windows 7 developer environment, which submits a job to the remote Hadoop cluster, initiates a mapreduce there, and then downloads the results back to the local machine.
General use case is to use hadoop services from a web application installed on a non-cluster computer, or as part of a developer environment.
The problem was, that the ApplicationMaster's startup shell script (launch_container.sh) was generated with wrong CLASSPATH entry. Together with the java process call on the bottom of the file, these entries were generated in windows style, using % as shell variable marker and ; as the CLASSPATH delimiter.
I tracked down the root cause, and found that the MrApps.java, and the YarnRunner.java classes create these entries, and is passed forward to the ApplicationMaster, assuming that the OS that runs these classes will match the one running the ApplicationMaster. But it's not the case, these are in 2 different jvm, and also the OS can be different, the strings are generated based on the client/submitter side's OS.
I made some workaround changes to these 2 files, so i could launch my job, however there may be more problems ahead.
13/12/04 16:33:15 INFO mapreduce.Job: Job job_1386170530016_0001 failed with state FAILED due to: Application application_1386170530016_0001 failed 2 times due to AM Container for appattempt_1386170530016_0001_000002 exited with exitCode: 1 due to: Exception from container-launch:
org.apache.hadoop.util.Shell$ExitCodeException: /bin/bash: line 0: fg: no job control
It also reqires to add the following property to
mapred-site.xml (or mapred-default.xml), on the windows box, so that the job launcher knows, that the job runner will be a linux:
<description>Remote MapReduce framework's OS, can be either Linux or Windows</description>
without this entry, the patched jar does the same as the unpatched, so it's required to work!