Uploaded image for project: 'Hadoop Map/Reduce'
  1. Hadoop Map/Reduce
  2. MAPREDUCE-2953

JobClient fails due to a race in RM, removes staged files and in turn crashes MR AM

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 0.23.0
    • 0.23.0
    • mrv2, resourcemanager
    • None

    Description

      Karams ran into this multiple times. MR JobClient crashes immediately.

      11/09/08 10:52:35 INFO mapreduce.JobSubmitter: number of splits:2094
      11/09/08 10:52:36 INFO mapred.YARNRunner: AppMaster capability = memory: 2048,
      11/09/08 10:52:36 INFO mapred.YARNRunner: Command to launch container for ApplicationMaster is : $JAVA_HOME/bin/java -Dhadoop.root.logger=INFO,console -Xmx1536m org.apache.hadoop.mapreduce.v2.app.MRAppMaster 1315478927026 1 <FAILCOUNT> 1><LOG_DIR>/stdout 2><LOG_DIR>/stderr
      11/09/08 10:52:36 INFO mapred.ResourceMgrDelegate: Submitted application application_1315478927026_1 to ResourceManager
      11/09/08 10:52:36 INFO mapreduce.JobSubmitter: Cleaning up the staging area /user/gridperf/.staging/job_1315478927026_0001
      RemoteTrace:
       at Local Trace:
              org.apache.hadoop.yarn.exceptions.impl.pb.YarnRemoteExceptionPBImpl: failed to run job
              at org.apache.hadoop.yarn.factories.impl.pb.YarnRemoteExceptionFactoryPBImpl.createYarnRemoteException(YarnRemoteExceptionFactoryPBImpl.java:39)
              at org.apache.hadoop.yarn.ipc.RPCUtil.getRemoteException(RPCUtil.java:47)
              at org.apache.hadoop.mapred.YARNRunner.submitJob(YARNRunner.java:250)
              at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:377)
              at org.apache.hadoop.mapreduce.Job$2.run(Job.java:1072)
              at org.apache.hadoop.mapreduce.Job$2.run(Job.java:1069)
              at java.security.AccessController.doPrivileged(Native Method)
              at javax.security.auth.Subject.doAs(Subject.java:396)
              at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1135)
              at org.apache.hadoop.mapreduce.Job.submit(Job.java:1069)
              at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1089)
              at org.apache.hadoop.examples.RandomWriter.run(RandomWriter.java:283)
              at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:69)
              at org.apache.hadoop.examples.RandomWriter.main(RandomWriter.java:294)
              at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
              at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
              at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
              at java.lang.reflect.Method.invoke(Method.java:597)
              at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:72)
              at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:144)
              at org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:68)
              at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
              at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
              at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
              at java.lang.reflect.Method.invoke(Method.java:597)
              at org.apache.hadoop.util.RunJar.main(RunJar.java:189)
      }
      

      The client crashes due to a race in RM.

      Because the client fails, it immediately removes the staged files which in turn makes the MR AM itself to crash due to failed localization on the NM.

      Attachments

        1. MAPREDUCE-2953.patch
          5 kB
          Thomas Graves
        2. MAPREDUCE-2953-v2.patch
          6 kB
          Thomas Graves
        3. MAPREDUCE-2953-v3.patch
          7 kB
          Thomas Graves

        Issue Links

          Activity

            People

              tgraves Thomas Graves
              vinodkv Vinod Kumar Vavilapalli
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: