Description
When I execute a pi job with arguments:
-Dmapreduce.map.resource.memory-mb=200 -Dmapreduce.map.resource.resource1=500M 1 1000
and I have one node with 5GB of resource1, I get the following exception on every second and the job hangs:
2018-04-24 08:42:03,694 INFO org.apache.hadoop.ipc.Server: IPC Server handler 20 on 8030, call Call#386 Retry#0 org.apache.hadoop.yarn.api.ApplicationMasterProtocolPB.allocate from 172.31.119.172:58138 org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid resource request, requested resource type=[resource1] < 0 or greater than maximum allowed allocation. Requested resource=<memory:200, vCores:1, resource1: 500M>, maximum allowed allocation=<memory:6144, vCores:8, resource1: 5G>, please note that maximum allowed allocation is calculated by scheduler based on maximum resource of registered NodeManagers, which might be less than configured maximum allocation=<memory:8192, vCores:8192, resource1: 9223372036854775807G> at org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:286) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:242) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndvalidateRequest(SchedulerUtils.java:258) at org.apache.hadoop.yarn.server.resourcemanager.RMServerUtils.normalizeAndValidateRequests(RMServerUtils.java:249) at org.apache.hadoop.yarn.server.resourcemanager.DefaultAMSProcessor.allocate(DefaultAMSProcessor.java:230) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.constraint.processor.DisabledPlacementProcessor.allocate(DisabledPlacementProcessor.java:75) at org.apache.hadoop.yarn.server.resourcemanager.AMSProcessingChain.allocate(AMSProcessingChain.java:92) at org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:433) at org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.allocate(ApplicationMasterProtocolPBServiceImpl.java:60) at org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:99) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:872) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:818) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2678)
This is because org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils#validateResourceRequest does not take resource units into account.
However, if I start a job with arguments:
-Dmapreduce.map.resource.memory-mb=200 -Dmapreduce.map.resource.resource1=1G 1 1000
and I still have 5GB of resource1 on one node then the job runs successfully.
I also tried a third job run, when I request 1GB of resource1 and I have no nodes with any amount of resource1, then I restart the node with 5GBs of resource1, the job ultimately completes, but just after the node with enough resources registered in RM, which is the desired behaviour.
Attachments
Attachments
Issue Links
- is related to
-
YARN-7739 DefaultAMSProcessor should properly check customized resource types against minimum/maximum allocation
- Resolved