Hadoop Map/Reduce
  1. Hadoop Map/Reduce
  2. MAPREDUCE-2953

JobClient fails due to a race in RM, removes staged files and in turn crashes MR AM

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.23.0
    • Fix Version/s: 0.23.0
    • Component/s: mrv2, resourcemanager
    • Labels:
      None

      Description

      Karam Singh ran into this multiple times. MR JobClient crashes immediately.

      11/09/08 10:52:35 INFO mapreduce.JobSubmitter: number of splits:2094
      11/09/08 10:52:36 INFO mapred.YARNRunner: AppMaster capability = memory: 2048,
      11/09/08 10:52:36 INFO mapred.YARNRunner: Command to launch container for ApplicationMaster is : $JAVA_HOME/bin/java -Dhadoop.root.logger=INFO,console -Xmx1536m org.apache.hadoop.mapreduce.v2.app.MRAppMaster 1315478927026 1 <FAILCOUNT> 1><LOG_DIR>/stdout 2><LOG_DIR>/stderr
      11/09/08 10:52:36 INFO mapred.ResourceMgrDelegate: Submitted application application_1315478927026_1 to ResourceManager
      11/09/08 10:52:36 INFO mapreduce.JobSubmitter: Cleaning up the staging area /user/gridperf/.staging/job_1315478927026_0001
      RemoteTrace:
       at Local Trace:
              org.apache.hadoop.yarn.exceptions.impl.pb.YarnRemoteExceptionPBImpl: failed to run job
              at org.apache.hadoop.yarn.factories.impl.pb.YarnRemoteExceptionFactoryPBImpl.createYarnRemoteException(YarnRemoteExceptionFactoryPBImpl.java:39)
              at org.apache.hadoop.yarn.ipc.RPCUtil.getRemoteException(RPCUtil.java:47)
              at org.apache.hadoop.mapred.YARNRunner.submitJob(YARNRunner.java:250)
              at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:377)
              at org.apache.hadoop.mapreduce.Job$2.run(Job.java:1072)
              at org.apache.hadoop.mapreduce.Job$2.run(Job.java:1069)
              at java.security.AccessController.doPrivileged(Native Method)
              at javax.security.auth.Subject.doAs(Subject.java:396)
              at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1135)
              at org.apache.hadoop.mapreduce.Job.submit(Job.java:1069)
              at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1089)
              at org.apache.hadoop.examples.RandomWriter.run(RandomWriter.java:283)
              at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:69)
              at org.apache.hadoop.examples.RandomWriter.main(RandomWriter.java:294)
              at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
              at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
              at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
              at java.lang.reflect.Method.invoke(Method.java:597)
              at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:72)
              at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:144)
              at org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:68)
              at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
              at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
              at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
              at java.lang.reflect.Method.invoke(Method.java:597)
              at org.apache.hadoop.util.RunJar.main(RunJar.java:189)
      }
      

      The client crashes due to a race in RM.

      Because the client fails, it immediately removes the staged files which in turn makes the MR AM itself to crash due to failed localization on the NM.

      1. MAPREDUCE-2953-v3.patch
        7 kB
        Thomas Graves
      2. MAPREDUCE-2953-v2.patch
        6 kB
        Thomas Graves
      3. MAPREDUCE-2953.patch
        5 kB
        Thomas Graves

        Issue Links

          Activity

          No work has yet been logged on this issue.

            People

            • Assignee:
              Thomas Graves
              Reporter:
              Vinod Kumar Vavilapalli
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development