Uploaded image for project: 'Hadoop Map/Reduce'
  1. Hadoop Map/Reduce
  2. MAPREDUCE-2953

JobClient fails due to a race in RM, removes staged files and in turn crashes MR AM

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 0.23.0
    • Fix Version/s: 0.23.0
    • Component/s: mrv2, resourcemanager
    • Labels:
      None

      Description

      Karam Singh ran into this multiple times. MR JobClient crashes immediately.

      11/09/08 10:52:35 INFO mapreduce.JobSubmitter: number of splits:2094
      11/09/08 10:52:36 INFO mapred.YARNRunner: AppMaster capability = memory: 2048,
      11/09/08 10:52:36 INFO mapred.YARNRunner: Command to launch container for ApplicationMaster is : $JAVA_HOME/bin/java -Dhadoop.root.logger=INFO,console -Xmx1536m org.apache.hadoop.mapreduce.v2.app.MRAppMaster 1315478927026 1 <FAILCOUNT> 1><LOG_DIR>/stdout 2><LOG_DIR>/stderr
      11/09/08 10:52:36 INFO mapred.ResourceMgrDelegate: Submitted application application_1315478927026_1 to ResourceManager
      11/09/08 10:52:36 INFO mapreduce.JobSubmitter: Cleaning up the staging area /user/gridperf/.staging/job_1315478927026_0001
      RemoteTrace:
       at Local Trace:
              org.apache.hadoop.yarn.exceptions.impl.pb.YarnRemoteExceptionPBImpl: failed to run job
              at org.apache.hadoop.yarn.factories.impl.pb.YarnRemoteExceptionFactoryPBImpl.createYarnRemoteException(YarnRemoteExceptionFactoryPBImpl.java:39)
              at org.apache.hadoop.yarn.ipc.RPCUtil.getRemoteException(RPCUtil.java:47)
              at org.apache.hadoop.mapred.YARNRunner.submitJob(YARNRunner.java:250)
              at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:377)
              at org.apache.hadoop.mapreduce.Job$2.run(Job.java:1072)
              at org.apache.hadoop.mapreduce.Job$2.run(Job.java:1069)
              at java.security.AccessController.doPrivileged(Native Method)
              at javax.security.auth.Subject.doAs(Subject.java:396)
              at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1135)
              at org.apache.hadoop.mapreduce.Job.submit(Job.java:1069)
              at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1089)
              at org.apache.hadoop.examples.RandomWriter.run(RandomWriter.java:283)
              at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:69)
              at org.apache.hadoop.examples.RandomWriter.main(RandomWriter.java:294)
              at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
              at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
              at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
              at java.lang.reflect.Method.invoke(Method.java:597)
              at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:72)
              at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:144)
              at org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:68)
              at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
              at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
              at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
              at java.lang.reflect.Method.invoke(Method.java:597)
              at org.apache.hadoop.util.RunJar.main(RunJar.java:189)
      }
      

      The client crashes due to a race in RM.

      Because the client fails, it immediately removes the staged files which in turn makes the MR AM itself to crash due to failed localization on the NM.

        Attachments

        1. MAPREDUCE-2953.patch
          5 kB
          Thomas Graves
        2. MAPREDUCE-2953-v2.patch
          6 kB
          Thomas Graves
        3. MAPREDUCE-2953-v3.patch
          7 kB
          Thomas Graves

          Issue Links

            Activity

              People

              • Assignee:
                tgraves Thomas Graves
                Reporter:
                vinodkv Vinod Kumar Vavilapalli
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: