Uploaded image for project: 'Hadoop Map/Reduce'
  1. Hadoop Map/Reduce
  2. MAPREDUCE-6317

Invalid Resource Exception could be handled properly when cores not available

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Minor
    • Resolution: Duplicate
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: applicationmaster, mr-am
    • Labels:
      None
    • Environment:

      1 RM , 1 DN

      Description

      Configure yarn.nodemanager.resource.cpu-vcores=2 for NM
      Set mapreduce.map.cpu.vcores=5 while running sleep job n client

      2015-04-10 20:37:26,111 ERROR [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator: ERROR IN CONTACTING RM. 
      org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid resource request, requested virtual cores < 0, or requested virtual cores > max configured, requestedVirtualCores=5, maxVirtualCores=2
      	at org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:213)
      	at org.apache.hadoop.yarn.server.resourcemanager.RMServerUtils.validateResourceRequests(RMServerUtils.java:97)
      	at org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:502)
      	at org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.allocate(ApplicationMasterProtocolPBServiceImpl.java:60)
      	at org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:99)
      	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:636)
      	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:976)
      	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2142)
      	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2138)
      	at java.security.AccessController.doPrivileged(Native Method)
      	at javax.security.auth.Subject.doAs(Subject.java:415)
      	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1669)
      	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2136)
      
      	at sun.reflect.GeneratedConstructorAccessor17.newInstance(Unknown Source)
      	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
      	at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
      	at org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53)
      	at org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:101)
      	at org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.allocate(ApplicationMasterProtocolPBClientImpl.java:79)
      	at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source)
      	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      	at java.lang.reflect.Method.invoke(Method.java:606)
      	at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)
      	at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:101)
      	at com.sun.proxy.$Proxy34.allocate(Unknown Source)
      	at org.apache.hadoop.mapreduce.v2.app.rm.RMContainerRequestor.makeRemoteRequest(RMContainerRequestor.java:199)
      	at org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.getResources(RMContainerAllocator.java:686)
      	at org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.heartbeat(RMContainerAllocator.java:257)
      	at org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator$1.run(RMCommunicator.java:281)
      	at java.lang.Thread.run(Thread.java:745)
      Caused by: 2015-04-10 20:37:27,117 ERROR [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Could not contact RM after 360000 milliseconds.
      2015-04-10 20:37:27,173 ERROR [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator: Error communicating with RM: Could not contact RM after 360000 milliseconds.
      org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Could not contact RM after 360000 milliseconds.
      	at org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.getResources(RMContainerAllocator.java:712)
      	at org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.heartbeat(RMContainerAllocator.java:257)
      	at org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator$1.run(RMCommunicator.java:281)
      	at java.lang.Thread.run(Thread.java:745)
      

      Rm communication timeout is thrown and fails after 2 app attempts
      Invalid resource exception not handled in RM container allocator

          @SuppressWarnings("unchecked")
        private List<Container> getResources() throws Exception {
          applyConcurrentTaskLimits();
      
          // will be null the first time
          Resource headRoom =
              getAvailableResources() == null ? Resources.none() :
                  Resources.clone(getAvailableResources());
          AllocateResponse response;
          /*
           * If contact with RM is lost, the AM will wait MR_AM_TO_RM_WAIT_INTERVAL_MS
           * milliseconds before aborting. During this interval, AM will still try
           * to contact the RM.
           */
          try {
            response = makeRemoteRequest();
            // Reset retry count if no exception occurred.
            retrystartTime = System.currentTimeMillis();
          } catch (ApplicationAttemptNotFoundException e ) {
      

      RMContainerAllocator should handle invalid resource exception and wait till new nodemanager added with expected resource

        Attachments

          Activity

            People

            • Assignee:
              bibinchundatt Bibin A Chundatt
              Reporter:
              bibinchundatt Bibin A Chundatt
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: