Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-10438

Handle null containerId in ClientRMService#getContainerReport()

    XMLWordPrintableJSON

Details

    Description

      Here is the Exception trace which we are seeing, we are suspecting because of this exception RM is reaching in a state where it is no more allowing any new job to run on the cluster.

      2020-09-15 07:08:15,496 WARN ipc.Server: IPC Server handler 18 on default port 8032, call Call#1463486 Retry#0 org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getContainerReport from 10.39.91.205:49564 java.lang.NullPointerException at org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getContainerReport(ClientRMService.java:520) at org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getContainerReport(ApplicationClientProtocolPBServiceImpl.java:466) at org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:639) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:528) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1070) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:999) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:927) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2915)
      

      We are seeing this issue with this specific node only, we do run this cluster at a scale of around 500 nodes.

      Attachments

        Issue Links

          Activity

            People

              shubhamod Shubham Gupta
              raghvendra.s Raghvendra Singh
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: