Uploaded image for project: 'Hadoop Map/Reduce'
  1. Hadoop Map/Reduce
  2. MAPREDUCE-5655

Remote job submit from windows to a linux hadoop cluster fails due to wrong classpath

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Duplicate
    • 2.2.0, 2.3.0
    • None
    • client, job submission
    • None
    • Client machine is a Windows 7 box, with Eclipse
      Remote: there is a multi node hadoop cluster, installed on Ubuntu boxes (any linux)

    Description

      I was trying to run a java class on my client, windows 7 developer environment, which submits a job to the remote Hadoop cluster, initiates a mapreduce there, and then downloads the results back to the local machine.

      General use case is to use hadoop services from a web application installed on a non-cluster computer, or as part of a developer environment.

      The problem was, that the ApplicationMaster's startup shell script (launch_container.sh) was generated with wrong CLASSPATH entry. Together with the java process call on the bottom of the file, these entries were generated in windows style, using % as shell variable marker and ; as the CLASSPATH delimiter.

      I tracked down the root cause, and found that the MrApps.java, and the YarnRunner.java classes create these entries, and is passed forward to the ApplicationMaster, assuming that the OS that runs these classes will match the one running the ApplicationMaster. But it's not the case, these are in 2 different jvm, and also the OS can be different, the strings are generated based on the client/submitter side's OS.

      I made some workaround changes to these 2 files, so i could launch my job, however there may be more problems ahead.

      update
      error message:
      13/12/04 16:33:15 INFO mapreduce.Job: Job job_1386170530016_0001 failed with state FAILED due to: Application application_1386170530016_0001 failed 2 times due to AM Container for appattempt_1386170530016_0001_000002 exited with exitCode: 1 due to: Exception from container-launch:
      org.apache.hadoop.util.Shell$ExitCodeException: /bin/bash: line 0: fg: no job control

      at org.apache.hadoop.util.Shell.runCommand(Shell.java:464)
      at org.apache.hadoop.util.Shell.run(Shell.java:379)
      at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:589)
      at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195)
      at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:283)
      at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:79)
      at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
      at java.util.concurrent.FutureTask.run(FutureTask.java:166)
      at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
      at java.lang.Thread.run(Thread.java:724)

      update2:
      It also reqires to add the following property to
      mapred-site.xml (or mapred-default.xml), on the windows box, so that the job launcher knows, that the job runner will be a linux:
      <property>
      <name>mapred.remote.os</name>
      <value>Linux</value>
      <description>Remote MapReduce framework's OS, can be either Linux or Windows</description>
      </property

      without this entry, the patched jar does the same as the unpatched, so it's required to work!

      Attachments

        1. MRApps.patch
          5 kB
          Attila Pados
        2. YARNRunner.patch
          0.7 kB
          Attila Pados

        Issue Links

          Activity

            People

              JoyoungZhang@gmail.com JoneZhang
              padisah Attila Pados
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: