Uploaded image for project: 'Hadoop Map/Reduce'
  1. Hadoop Map/Reduce
  2. MAPREDUCE-6439

AM may fail instead of retrying if RM shuts down during the allocate call

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: 2.7.1
    • Fix Version/s: 2.8.0, 2.7.2, 3.0.0-alpha1
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      We are seeing cases where MR AM gets a YarnRuntimeException thats thrown in RM and gets sent back to AM causing it to think that it has exhausted the number of retries. Copying the error which causes the heartbeat thread to quit.

      2015-07-25 20:07:27,346 ERROR [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Error communicating with RM: java.lang.InterruptedException
      	at org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:245)
      	at org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:469)
      	at org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.allocate(ApplicationMasterProtocolPBServiceImpl.java:60)
      	at org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:99)
      	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:587)
      	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026)
      	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
      	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
      	at java.security.AccessController.doPrivileged(Native Method)
      	at javax.security.auth.Subject.doAs(Subject.java:415)
      	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1642)
      	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
      Caused by: java.lang.InterruptedException
      	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1219)
      	at java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340)
      	at java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:338)
      	at org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:240)
      	... 11 more
      
      org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.lang.InterruptedException
      	at org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:245)
      	at org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:469)
      	at org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.allocate(ApplicationMasterProtocolPBServiceImpl.java:60)
      	at org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:99)
      	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:587)
      	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026)
      	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
      	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
      	at java.security.AccessController.doPrivileged(Native Method)
      	at javax.security.auth.Subject.doAs(Subject.java:415)
      	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1642)
      	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
      Caused by: java.lang.InterruptedException
      	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1219)
      	at java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340)
      	at java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:338)
      	at org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:240)
      	... 11 more
      
      	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
      	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
      	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
      	at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
      	at org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53)
      	at org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:107)
      	at org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.allocate(ApplicationMasterProtocolPBClientImpl.java:79)
      	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
      	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      	at java.lang.reflect.Method.invoke(Method.java:606)
      	at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
      	at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
      	at com.sun.proxy.$Proxy36.allocate(Unknown Source)
      	at org.apache.hadoop.mapreduce.v2.app.rm.RMContainerRequestor.makeRemoteRequest(RMContainerRequestor.java:188)
      	at org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.getResources(RMContainerAllocator.java:667)
      	at org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.heartbeat(RMContainerAllocator.java:244)
      	at org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator$1.run(RMCommunicator.java:282)
      	at java.lang.Thread.run(Thread.java:745)
      Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.yarn.exceptions.YarnRuntimeException): java.lang.InterruptedException
      	at org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:245)
      	at org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:469)
      	at org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.allocate(ApplicationMasterProtocolPBServiceImpl.java:60)
      	at org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:99)
      	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:587)
      	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026)
      	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
      	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
      	at java.security.AccessController.doPrivileged(Native Method)
      	at javax.security.auth.Subject.doAs(Subject.java:415)
      	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1642)
      	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
      Caused by: java.lang.InterruptedException
      	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1219)
      	at java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340)
      	at java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:338)
      	at org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:240)
      	... 11 more
      
      	at org.apache.hadoop.ipc.Client.call(Client.java:1411)
      	at org.apache.hadoop.ipc.Client.call(Client.java:1364)
      	at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
      	at com.sun.proxy.$Proxy35.allocate(Unknown Source)
      	at org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.allocate(ApplicationMasterProtocolPBClientImpl.java:77)
      	... 12 more
      
      1. MAPREDUCE-6439.001.patch
        5 kB
        Anubhav Dhoot
      2. MAPREDUCE-6439.002.patch
        14 kB
        Anubhav Dhoot

        Issue Links

          Activity

          Hide
          sjlee0 Sangjin Lee added a comment -

          Does this issue exist in 2.6.x? Should this be backported to branch-2.6?

          Show
          sjlee0 Sangjin Lee added a comment - Does this issue exist in 2.6.x? Should this be backported to branch-2.6?
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #277 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/277/)
          MAPREDUCE-6439. AM may fail instead of retrying if RM shuts down during the allocate call. (Anubhav Dhoot via kasha) (kasha: rev 8dfec7a1979e8f70f8355c096874921d368342ef)

          • hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/rm/TestRMCommunicator.java
          • hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/rm/RMContainerAllocationException.java
          • hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/rm/RMContainerAllocator.java
          • hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/rm/RMCommunicator.java
          • hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/rm/TestRMContainerAllocator.java
          • hadoop-mapreduce-project/CHANGES.txt
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #277 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/277/ ) MAPREDUCE-6439 . AM may fail instead of retrying if RM shuts down during the allocate call. (Anubhav Dhoot via kasha) (kasha: rev 8dfec7a1979e8f70f8355c096874921d368342ef) hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/rm/TestRMCommunicator.java hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/rm/RMContainerAllocationException.java hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/rm/RMContainerAllocator.java hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/rm/RMCommunicator.java hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/rm/TestRMContainerAllocator.java hadoop-mapreduce-project/CHANGES.txt
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-Hdfs-trunk #2215 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/2215/)
          MAPREDUCE-6439. AM may fail instead of retrying if RM shuts down during the allocate call. (Anubhav Dhoot via kasha) (kasha: rev 8dfec7a1979e8f70f8355c096874921d368342ef)

          • hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/rm/TestRMContainerAllocator.java
          • hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/rm/RMContainerAllocationException.java
          • hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/rm/RMCommunicator.java
          • hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/rm/RMContainerAllocator.java
          • hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/rm/TestRMCommunicator.java
          • hadoop-mapreduce-project/CHANGES.txt
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Hdfs-trunk #2215 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/2215/ ) MAPREDUCE-6439 . AM may fail instead of retrying if RM shuts down during the allocate call. (Anubhav Dhoot via kasha) (kasha: rev 8dfec7a1979e8f70f8355c096874921d368342ef) hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/rm/TestRMContainerAllocator.java hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/rm/RMContainerAllocationException.java hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/rm/RMCommunicator.java hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/rm/RMContainerAllocator.java hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/rm/TestRMCommunicator.java hadoop-mapreduce-project/CHANGES.txt
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #285 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/285/)
          MAPREDUCE-6439. AM may fail instead of retrying if RM shuts down during the allocate call. (Anubhav Dhoot via kasha) (kasha: rev 8dfec7a1979e8f70f8355c096874921d368342ef)

          • hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/rm/RMContainerAllocationException.java
          • hadoop-mapreduce-project/CHANGES.txt
          • hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/rm/RMContainerAllocator.java
          • hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/rm/TestRMCommunicator.java
          • hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/rm/TestRMContainerAllocator.java
          • hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/rm/RMCommunicator.java
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #285 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/285/ ) MAPREDUCE-6439 . AM may fail instead of retrying if RM shuts down during the allocate call. (Anubhav Dhoot via kasha) (kasha: rev 8dfec7a1979e8f70f8355c096874921d368342ef) hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/rm/RMContainerAllocationException.java hadoop-mapreduce-project/CHANGES.txt hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/rm/RMContainerAllocator.java hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/rm/TestRMCommunicator.java hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/rm/TestRMContainerAllocator.java hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/rm/RMCommunicator.java
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-Mapreduce-trunk #2234 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2234/)
          MAPREDUCE-6439. AM may fail instead of retrying if RM shuts down during the allocate call. (Anubhav Dhoot via kasha) (kasha: rev 8dfec7a1979e8f70f8355c096874921d368342ef)

          • hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/rm/RMCommunicator.java
          • hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/rm/TestRMCommunicator.java
          • hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/rm/RMContainerAllocationException.java
          • hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/rm/RMContainerAllocator.java
          • hadoop-mapreduce-project/CHANGES.txt
          • hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/rm/TestRMContainerAllocator.java
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Mapreduce-trunk #2234 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2234/ ) MAPREDUCE-6439 . AM may fail instead of retrying if RM shuts down during the allocate call. (Anubhav Dhoot via kasha) (kasha: rev 8dfec7a1979e8f70f8355c096874921d368342ef) hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/rm/RMCommunicator.java hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/rm/TestRMCommunicator.java hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/rm/RMContainerAllocationException.java hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/rm/RMContainerAllocator.java hadoop-mapreduce-project/CHANGES.txt hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/rm/TestRMContainerAllocator.java
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-Yarn-trunk #1018 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/1018/)
          MAPREDUCE-6439. AM may fail instead of retrying if RM shuts down during the allocate call. (Anubhav Dhoot via kasha) (kasha: rev 8dfec7a1979e8f70f8355c096874921d368342ef)

          • hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/rm/TestRMCommunicator.java
          • hadoop-mapreduce-project/CHANGES.txt
          • hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/rm/RMContainerAllocator.java
          • hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/rm/RMContainerAllocationException.java
          • hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/rm/RMCommunicator.java
          • hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/rm/TestRMContainerAllocator.java
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Yarn-trunk #1018 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/1018/ ) MAPREDUCE-6439 . AM may fail instead of retrying if RM shuts down during the allocate call. (Anubhav Dhoot via kasha) (kasha: rev 8dfec7a1979e8f70f8355c096874921d368342ef) hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/rm/TestRMCommunicator.java hadoop-mapreduce-project/CHANGES.txt hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/rm/RMContainerAllocator.java hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/rm/RMContainerAllocationException.java hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/rm/RMCommunicator.java hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/rm/TestRMContainerAllocator.java
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #288 (See https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/288/)
          MAPREDUCE-6439. AM may fail instead of retrying if RM shuts down during the allocate call. (Anubhav Dhoot via kasha) (kasha: rev 8dfec7a1979e8f70f8355c096874921d368342ef)

          • hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/rm/RMContainerAllocator.java
          • hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/rm/RMCommunicator.java
          • hadoop-mapreduce-project/CHANGES.txt
          • hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/rm/TestRMContainerAllocator.java
          • hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/rm/TestRMCommunicator.java
          • hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/rm/RMContainerAllocationException.java
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #288 (See https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/288/ ) MAPREDUCE-6439 . AM may fail instead of retrying if RM shuts down during the allocate call. (Anubhav Dhoot via kasha) (kasha: rev 8dfec7a1979e8f70f8355c096874921d368342ef) hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/rm/RMContainerAllocator.java hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/rm/RMCommunicator.java hadoop-mapreduce-project/CHANGES.txt hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/rm/TestRMContainerAllocator.java hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/rm/TestRMCommunicator.java hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/rm/RMContainerAllocationException.java
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-trunk-Commit #8310 (See https://builds.apache.org/job/Hadoop-trunk-Commit/8310/)
          MAPREDUCE-6439. AM may fail instead of retrying if RM shuts down during the allocate call. (Anubhav Dhoot via kasha) (kasha: rev 8dfec7a1979e8f70f8355c096874921d368342ef)

          • hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/rm/TestRMCommunicator.java
          • hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/rm/RMContainerAllocationException.java
          • hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/rm/RMContainerAllocator.java
          • hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/rm/TestRMContainerAllocator.java
          • hadoop-mapreduce-project/CHANGES.txt
          • hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/rm/RMCommunicator.java
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-trunk-Commit #8310 (See https://builds.apache.org/job/Hadoop-trunk-Commit/8310/ ) MAPREDUCE-6439 . AM may fail instead of retrying if RM shuts down during the allocate call. (Anubhav Dhoot via kasha) (kasha: rev 8dfec7a1979e8f70f8355c096874921d368342ef) hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/rm/TestRMCommunicator.java hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/rm/RMContainerAllocationException.java hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/rm/RMContainerAllocator.java hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/rm/TestRMContainerAllocator.java hadoop-mapreduce-project/CHANGES.txt hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/rm/RMCommunicator.java
          Hide
          kasha Karthik Kambatla added a comment -

          Just committed this to trunk, branch-2 and branch-2.7.

          Thanks Anubhav for the contribution, Jason and Zhijie for your inputs.

          Show
          kasha Karthik Kambatla added a comment - Just committed this to trunk, branch-2 and branch-2.7. Thanks Anubhav for the contribution, Jason and Zhijie for your inputs.
          Hide
          kasha Karthik Kambatla added a comment -

          I saw the same test failure on MAPREDUCE-5817 as well.

          The patch itself looks good to me. +1, checking this in.

          Show
          kasha Karthik Kambatla added a comment - I saw the same test failure on MAPREDUCE-5817 as well. The patch itself looks good to me. +1, checking this in.
          Hide
          hadoopqa Hadoop QA added a comment -



          -1 overall



          Vote Subsystem Runtime Comment
          0 pre-patch 17m 6s Pre-patch trunk compilation is healthy.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 tests included 0m 0s The patch appears to include 2 new or modified test files.
          +1 javac 8m 13s There were no new javac warning messages.
          +1 javadoc 10m 32s There were no new javadoc warning messages.
          +1 release audit 0m 24s The applied patch does not increase the total number of release audit warnings.
          -1 checkstyle 0m 38s The applied patch generated 3 new checkstyle issues (total was 129, now 132).
          +1 whitespace 0m 0s The patch has no lines that end in whitespace.
          +1 install 1m 26s mvn install still works.
          +1 eclipse:eclipse 0m 36s The patch built with eclipse:eclipse.
          +1 findbugs 1m 11s The patch does not introduce any new Findbugs (version 3.0.0) warnings.
          -1 mapreduce tests 9m 11s Tests failed in hadoop-mapreduce-client-app.
              49m 21s  



          Reason Tests
          Failed unit tests hadoop.mapreduce.v2.app.TestJobEndNotifier



          Subsystem Report/Notes
          Patch URL http://issues.apache.org/jira/secure/attachment/12749976/MAPREDUCE-6439.002.patch
          Optional Tests javadoc javac unit findbugs checkstyle
          git revision trunk / 40f8151
          checkstyle https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5935/artifact/patchprocess/diffcheckstylehadoop-mapreduce-client-app.txt
          hadoop-mapreduce-client-app test log https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5935/artifact/patchprocess/testrun_hadoop-mapreduce-client-app.txt
          Test Results https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5935/testReport/
          Java 1.7.0_55
          uname Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Console output https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5935/console

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 pre-patch 17m 6s Pre-patch trunk compilation is healthy. +1 @author 0m 0s The patch does not contain any @author tags. +1 tests included 0m 0s The patch appears to include 2 new or modified test files. +1 javac 8m 13s There were no new javac warning messages. +1 javadoc 10m 32s There were no new javadoc warning messages. +1 release audit 0m 24s The applied patch does not increase the total number of release audit warnings. -1 checkstyle 0m 38s The applied patch generated 3 new checkstyle issues (total was 129, now 132). +1 whitespace 0m 0s The patch has no lines that end in whitespace. +1 install 1m 26s mvn install still works. +1 eclipse:eclipse 0m 36s The patch built with eclipse:eclipse. +1 findbugs 1m 11s The patch does not introduce any new Findbugs (version 3.0.0) warnings. -1 mapreduce tests 9m 11s Tests failed in hadoop-mapreduce-client-app.     49m 21s   Reason Tests Failed unit tests hadoop.mapreduce.v2.app.TestJobEndNotifier Subsystem Report/Notes Patch URL http://issues.apache.org/jira/secure/attachment/12749976/MAPREDUCE-6439.002.patch Optional Tests javadoc javac unit findbugs checkstyle git revision trunk / 40f8151 checkstyle https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5935/artifact/patchprocess/diffcheckstylehadoop-mapreduce-client-app.txt hadoop-mapreduce-client-app test log https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5935/artifact/patchprocess/testrun_hadoop-mapreduce-client-app.txt Test Results https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5935/testReport/ Java 1.7.0_55 uname Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Console output https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5935/console This message was automatically generated.
          Hide
          adhoot Anubhav Dhoot added a comment -

          Filed followup Jira to fix MR code where we throw and catch YarnRuntimeException

          Show
          adhoot Anubhav Dhoot added a comment - Filed followup Jira to fix MR code where we throw and catch YarnRuntimeException
          Hide
          adhoot Anubhav Dhoot added a comment -

          Jira for not throwing YarnRuntimeException to client

          Show
          adhoot Anubhav Dhoot added a comment - Jira for not throwing YarnRuntimeException to client
          Hide
          adhoot Anubhav Dhoot added a comment -

          Added Unit tests to verify changes in exception handling

          Show
          adhoot Anubhav Dhoot added a comment - Added Unit tests to verify changes in exception handling
          Hide
          kasha Karthik Kambatla added a comment -

          Oh, and it would be nice to add some unit tests for the code change here.

          Show
          kasha Karthik Kambatla added a comment - Oh, and it would be nice to add some unit tests for the code change here.
          Hide
          kasha Karthik Kambatla added a comment -

          Using Yarn*Exception for communication between threads/classes in MR should be avoided. I like Anubhav's current approach (patch - v1). Anubhav Dhoot - could we file a follow-up MR JIRA to address other MR cases as Jason and Zhijie suggested. YARN-4021 could fix the way we throw YarnRTE in trunk only.

          Show
          kasha Karthik Kambatla added a comment - Using Yarn*Exception for communication between threads/classes in MR should be avoided. I like Anubhav's current approach (patch - v1). Anubhav Dhoot - could we file a follow-up MR JIRA to address other MR cases as Jason and Zhijie suggested. YARN-4021 could fix the way we throw YarnRTE in trunk only.
          Hide
          zxu zhihai xu added a comment -

          Thanks Jason Lowe and Zhijie Shen for the good suggestions! thanks Anubhav Dhoot for the new information!
          The culprit of this issue is how to differentiate between local YarnRuntimeException and remote YarnRuntimeException.
          Based on the code RPCUtil#instantiateException which instantiates the remote YarnRuntimeException

            private static <T extends Throwable> T instantiateException(
                Class<? extends T> cls, RemoteException re) throws RemoteException {
              try {
                Constructor<? extends T> cn = cls.getConstructor(String.class);
                cn.setAccessible(true);
                T ex = cn.newInstance(re.getMessage());
                ex.initCause(re);
                return ex;
          

          and the stack trace,

          Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.yarn.exceptions.YarnRuntimeException): java.lang.InterruptedException
          	at org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53)
          	at org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:107)
          	at org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.allocate(ApplicationMasterProtocolPBClientImpl.java:79)
          

          The cause of the remote YarnRuntimeException must be RemoteException.
          Can we use the cause of YarnRuntimeException to differentiate between local YarnRuntimeException and remote YarnRuntimeException?
          If it happens that local YarnRuntimeException uses RemoteException as the cause, most likely it is a bug which we should fix it at the first place. At least for this issue, it should definitely work.
          The following is the fix using the cause of YarnRuntimeException at RMCommunicator#startAllocatorThread

                      try {
                        heartbeat();
                      } catch (YarnRuntimeException e) {
                        LOG.error("Error communicating with RM: " + e.getMessage() , e);
                        if (e.getCause() instanceof RemoteException) {
                          // ignore if it is a remote YarnRuntimeException
                          continue;
                        }
                        return;
                      } catch (Exception e) {
          

          I think the fix looks like simpler and more backwards compatible.
          Thoughts ?

          Show
          zxu zhihai xu added a comment - Thanks Jason Lowe and Zhijie Shen for the good suggestions! thanks Anubhav Dhoot for the new information! The culprit of this issue is how to differentiate between local YarnRuntimeException and remote YarnRuntimeException. Based on the code RPCUtil#instantiateException which instantiates the remote YarnRuntimeException private static <T extends Throwable> T instantiateException( Class <? extends T> cls, RemoteException re) throws RemoteException { try { Constructor<? extends T> cn = cls.getConstructor( String .class); cn.setAccessible( true ); T ex = cn.newInstance(re.getMessage()); ex.initCause(re); return ex; and the stack trace, Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.yarn.exceptions.YarnRuntimeException): java.lang.InterruptedException at org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53) at org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:107) at org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.allocate(ApplicationMasterProtocolPBClientImpl.java:79) The cause of the remote YarnRuntimeException must be RemoteException. Can we use the cause of YarnRuntimeException to differentiate between local YarnRuntimeException and remote YarnRuntimeException? If it happens that local YarnRuntimeException uses RemoteException as the cause, most likely it is a bug which we should fix it at the first place. At least for this issue, it should definitely work. The following is the fix using the cause of YarnRuntimeException at RMCommunicator#startAllocatorThread try { heartbeat(); } catch (YarnRuntimeException e) { LOG.error( "Error communicating with RM: " + e.getMessage() , e); if (e.getCause() instanceof RemoteException) { // ignore if it is a remote YarnRuntimeException continue ; } return ; } catch (Exception e) { I think the fix looks like simpler and more backwards compatible. Thoughts ?
          Hide
          adhoot Anubhav Dhoot added a comment -

          Created YARN-4021 for the follow up

          Show
          adhoot Anubhav Dhoot added a comment - Created YARN-4021 for the follow up
          Hide
          adhoot Anubhav Dhoot added a comment -

          Just doing a text search for catch of some RuntimeException shows 55 hits in product code and out of this 11 are for YarnRuntimeException (In MR its in MRAppMaster, AMWebServices, HsWebServices).
          If no one has an objection, I propose we do a follow up jira to fix the rest of cases where we catch a remote YarnRuntimeException/RuntimeException and wrap the Remote RTE after we fix this one as proposed and target it to 3.0 to avoid backcompat issues.

          Show
          adhoot Anubhav Dhoot added a comment - Just doing a text search for catch of some RuntimeException shows 55 hits in product code and out of this 11 are for YarnRuntimeException (In MR its in MRAppMaster, AMWebServices, HsWebServices). If no one has an objection, I propose we do a follow up jira to fix the rest of cases where we catch a remote YarnRuntimeException/RuntimeException and wrap the Remote RTE after we fix this one as proposed and target it to 3.0 to avoid backcompat issues.
          Hide
          zjshen Zhijie Shen added a comment -

          I would think we can wrap the remote runtimeexceptions into another exception that indicates its a remote exception and you can still get the original reason by looking inside. Zhijie Shen do you agree?

          I'm a bit concerned about the compatibility for those existing apps that have already caught RTE, unless it's YarnRemoteRuntimeExeption which extends YarnRuntimeException. I'm not sure whether it's too overkilling for this problem only. Therefore, I agree it's better to go through each cases to detect the potential RTE confusing.

          Show
          zjshen Zhijie Shen added a comment - I would think we can wrap the remote runtimeexceptions into another exception that indicates its a remote exception and you can still get the original reason by looking inside. Zhijie Shen do you agree? I'm a bit concerned about the compatibility for those existing apps that have already caught RTE, unless it's YarnRemoteRuntimeExeption which extends YarnRuntimeException. I'm not sure whether it's too overkilling for this problem only. Therefore, I agree it's better to go through each cases to detect the potential RTE confusing.
          Hide
          adhoot Anubhav Dhoot added a comment -

          Unwrapping of remote RuntimeExceptions was added in YARN-731 for server side validation of arguments. I would think we can wrap the remote runtimeexceptions into another exception that indicates its a remote exception and you can still get the original reason by looking inside. Zhijie Shen do you agree?
          Before we do that we need to go over each case of client expecting a RuntimeException and see if it needs to be updated.

          Show
          adhoot Anubhav Dhoot added a comment - Unwrapping of remote RuntimeExceptions was added in YARN-731 for server side validation of arguments. I would think we can wrap the remote runtimeexceptions into another exception that indicates its a remote exception and you can still get the original reason by looking inside. Zhijie Shen do you agree? Before we do that we need to go over each case of client expecting a RuntimeException and see if it needs to be updated.
          Hide
          jlowe Jason Lowe added a comment -

          I think a Mapreduce AM client-side fix is the safest from a backwards compatibility point of view. However it would be nice if clients could know that only IOExceptions and YarnExceptions and no escapable runtime exceptions are going to come from remote calls. If that's the case then the fix you propose for the allocate call would need to be replicated for the other calls and arguably other PBServiceImpls for other YARN protocols.

          Show
          jlowe Jason Lowe added a comment - I think a Mapreduce AM client-side fix is the safest from a backwards compatibility point of view. However it would be nice if clients could know that only IOExceptions and YarnExceptions and no escapable runtime exceptions are going to come from remote calls. If that's the case then the fix you propose for the allocate call would need to be replicated for the other calls and arguably other PBServiceImpls for other YARN protocols.
          Hide
          zxu zhihai xu added a comment -

          Thanks for working on this issue Anubhav Dhoot! Based on the stack trace, it looks like this issue is because AM(YARN IPC Client) can't differentiate between local YarnRuntimeException and remote YarnRuntimeException.
          It looks like we can either fix it at the client side(AM) or at the serve side(RM).
          Your attached patch is a client side fix, which replaces local YarnRuntimeException with a new exception RMContainerAllocationException. I think the patch will work, but I feel it looks more like a workaround

          It looks like there are several ways to fix it at the serve side(RM).
          I can think of the following two server side fixes.

          1. Similar as Anubhav's first option, Translates YarnRuntimeException to YarnException by calling RPCUtil#getRemoteException. Maybe
            we can do the translations at ApplicationMasterProtocolPBServiceImpl#allocate
              public AllocateResponseProto allocate(RpcController arg0,
                  AllocateRequestProto proto) throws ServiceException {
                AllocateRequestPBImpl request = new AllocateRequestPBImpl(proto);
                try {
                  AllocateResponse response = real.allocate(request);
                  return ((AllocateResponsePBImpl)response).getProto();
                } catch (YarnException e) {
                  throw new ServiceException(e);
                } catch (YarnRuntimeException e) {
                  throw new ServiceException(RPCUtil.getRemoteException(e));
                } catch (IOException e) {
                  throw new ServiceException(e);
                }
              }
            

            YARN-635 Rename YarnRemoteException to YarnException and YarnException to YarnRuntimeException. So this change looks like more backward compatible and more generalized.

          1. Don't translate InterruptedException to YarnRuntimeException at AsyncDispatcher. Currently the InterruptedException is hidden in AsyncDispatcher. The stack trace shows the YarnRuntimeException from AsyncDispatcher is sent to AM. Because lots of code uses AsyncDispatcher. This change may not be easy and safe.
                  } catch (InterruptedException e) {
                    if (!stopped) {
                      LOG.warn("AsyncDispatcher thread interrupted", e);
                    }
                    // Need to reset drained flag to true if event queue is empty,
                    // otherwise dispatcher will hang on stop.
                    drained = eventQueue.isEmpty();
                    throw new YarnRuntimeException(e);
                  }
            

          Jason Lowe - do you think any of my earlier suggestions are reasonable?

          Show
          zxu zhihai xu added a comment - Thanks for working on this issue Anubhav Dhoot ! Based on the stack trace, it looks like this issue is because AM(YARN IPC Client) can't differentiate between local YarnRuntimeException and remote YarnRuntimeException. It looks like we can either fix it at the client side(AM) or at the serve side(RM). Your attached patch is a client side fix, which replaces local YarnRuntimeException with a new exception RMContainerAllocationException. I think the patch will work, but I feel it looks more like a workaround It looks like there are several ways to fix it at the serve side(RM). I can think of the following two server side fixes. Similar as Anubhav's first option, Translates YarnRuntimeException to YarnException by calling RPCUtil#getRemoteException. Maybe we can do the translations at ApplicationMasterProtocolPBServiceImpl#allocate public AllocateResponseProto allocate(RpcController arg0, AllocateRequestProto proto) throws ServiceException { AllocateRequestPBImpl request = new AllocateRequestPBImpl(proto); try { AllocateResponse response = real.allocate(request); return ((AllocateResponsePBImpl)response).getProto(); } catch (YarnException e) { throw new ServiceException(e); } catch (YarnRuntimeException e) { throw new ServiceException(RPCUtil.getRemoteException(e)); } catch (IOException e) { throw new ServiceException(e); } } YARN-635 Rename YarnRemoteException to YarnException and YarnException to YarnRuntimeException. So this change looks like more backward compatible and more generalized. Don't translate InterruptedException to YarnRuntimeException at AsyncDispatcher. Currently the InterruptedException is hidden in AsyncDispatcher. The stack trace shows the YarnRuntimeException from AsyncDispatcher is sent to AM. Because lots of code uses AsyncDispatcher. This change may not be easy and safe. } catch (InterruptedException e) { if (!stopped) { LOG.warn( "AsyncDispatcher thread interrupted" , e); } // Need to reset drained flag to true if event queue is empty, // otherwise dispatcher will hang on stop. drained = eventQueue.isEmpty(); throw new YarnRuntimeException(e); } Jason Lowe - do you think any of my earlier suggestions are reasonable?
          Hide
          hadoopqa Hadoop QA added a comment -



          -1 overall



          Vote Subsystem Runtime Comment
          -1 pre-patch 15m 19s Findbugs (version ) appears to be broken on trunk.
          +1 @author 0m 0s The patch does not contain any @author tags.
          -1 tests included 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
          +1 javac 7m 39s There were no new javac warning messages.
          +1 javadoc 9m 43s There were no new javadoc warning messages.
          +1 release audit 0m 21s The applied patch does not increase the total number of release audit warnings.
          +1 checkstyle 0m 22s There were no new checkstyle issues.
          +1 whitespace 0m 0s The patch has no lines that end in whitespace.
          +1 install 1m 22s mvn install still works.
          +1 eclipse:eclipse 0m 33s The patch built with eclipse:eclipse.
          +1 findbugs 1m 5s The patch does not introduce any new Findbugs (version 3.0.0) warnings.
          +1 mapreduce tests 9m 8s Tests passed in hadoop-mapreduce-client-app.
              45m 35s  



          Subsystem Report/Notes
          Patch URL http://issues.apache.org/jira/secure/attachment/12748100/MAPREDUCE-6439.001.patch
          Optional Tests javadoc javac unit findbugs checkstyle
          git revision trunk / 469cfcd
          hadoop-mapreduce-client-app test log https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5926/artifact/patchprocess/testrun_hadoop-mapreduce-client-app.txt
          Test Results https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5926/testReport/
          Java 1.7.0_55
          uname Linux asf907.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Console output https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5926/console

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment -1 pre-patch 15m 19s Findbugs (version ) appears to be broken on trunk. +1 @author 0m 0s The patch does not contain any @author tags. -1 tests included 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javac 7m 39s There were no new javac warning messages. +1 javadoc 9m 43s There were no new javadoc warning messages. +1 release audit 0m 21s The applied patch does not increase the total number of release audit warnings. +1 checkstyle 0m 22s There were no new checkstyle issues. +1 whitespace 0m 0s The patch has no lines that end in whitespace. +1 install 1m 22s mvn install still works. +1 eclipse:eclipse 0m 33s The patch built with eclipse:eclipse. +1 findbugs 1m 5s The patch does not introduce any new Findbugs (version 3.0.0) warnings. +1 mapreduce tests 9m 8s Tests passed in hadoop-mapreduce-client-app.     45m 35s   Subsystem Report/Notes Patch URL http://issues.apache.org/jira/secure/attachment/12748100/MAPREDUCE-6439.001.patch Optional Tests javadoc javac unit findbugs checkstyle git revision trunk / 469cfcd hadoop-mapreduce-client-app test log https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5926/artifact/patchprocess/testrun_hadoop-mapreduce-client-app.txt Test Results https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5926/testReport/ Java 1.7.0_55 uname Linux asf907.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Console output https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5926/console This message was automatically generated.
          Hide
          adhoot Anubhav Dhoot added a comment -

          Attaching a patch with the second approach

          Show
          adhoot Anubhav Dhoot added a comment - Attaching a patch with the second approach
          Hide
          adhoot Anubhav Dhoot added a comment -

          The second option is we do not use YarnRuntimeException for communicating retry exhaustion and ApplicationAttemptNotFoundException from RMCommunicatorAllocator to RMCommunicator. We instead use a specific exception type defined in MR to denote this information and use that instead.

          Show
          adhoot Anubhav Dhoot added a comment - The second option is we do not use YarnRuntimeException for communicating retry exhaustion and ApplicationAttemptNotFoundException from RMCommunicatorAllocator to RMCommunicator. We instead use a specific exception type defined in MR to denote this information and use that instead.
          Hide
          adhoot Anubhav Dhoot added a comment -

          This seems to happen because RPCUtil will serialize over any RemoteException derived type which includes YarnRuntimeException. The problem is the same exception is used by RMContainerAllocator#getResources to tell RMCommunicator#startAllocatorThread that it has exhausted retries. And hence there are no retries attempted.
          So to fix this we should avoid confusing a remote YarnRuntimeException as a local way to detect retry attempt exhaustion.
          One possibility is we catch all remote exceptions and wrap them as YarnException which is to indicate its a remote exception.

          Show
          adhoot Anubhav Dhoot added a comment - This seems to happen because RPCUtil will serialize over any RemoteException derived type which includes YarnRuntimeException. The problem is the same exception is used by RMContainerAllocator#getResources to tell RMCommunicator#startAllocatorThread that it has exhausted retries. And hence there are no retries attempted. So to fix this we should avoid confusing a remote YarnRuntimeException as a local way to detect retry attempt exhaustion. One possibility is we catch all remote exceptions and wrap them as YarnException which is to indicate its a remote exception.

            People

            • Assignee:
              adhoot Anubhav Dhoot
              Reporter:
              adhoot Anubhav Dhoot
            • Votes:
              0 Vote for this issue
              Watchers:
              15 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development