Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-10791

Graceful decomission cause NPE during rolling upgrade from 2.6 to 3.2

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Patch Available
    • Minor
    • Resolution: Unresolved
    • 3.2.1
    • None
    • RM
    • None

    Description

      We are rolling upgrading Yarn from 2.6.0 to 3.2.1, and we met this Exception while we upgrading NM.

      When we exclude a node and call refreshNode gracefully, All the MR AMs will fail.  

      2021-05-28 11:36:35,790 ERROR [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: ERROR IN CONTACTING RM.
      java.lang.NullPointerException
      at org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.handleUpdatedNodes(RMContainerAllocator.java:883)
      at org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.getResources(RMContainerAllocator.java:821)
      at org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.heartbeat(RMContainerAllocator.java:316)
      at org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator$1.run(RMCommunicator.java:282)
      at java.lang.Thread.run(Thread.java:745)

      The reason of this is because we gracefully decomission nodes while using 2.6MR.

      handleUpdatedNodes of 2.6MR can not recognize the node state of "DECOMMISONING"

      So I add a config to decide if we should send the DECOMMISONING to AMs

      I don't know if it needs to be fixed, just raise a solution for this situation

      There are 2 nodes in the cluster, And the AM is deployed in node 44, I excluded 46, which is another node in the cluster, and then refreshnode, the error above occured.

      As what I say, I think the original reasion is the compatibility of NodeStateProto

      2.6 MR  can not recognize DECOMMISONING and SHUTDOWN

      Attachments

        1. YARN-10791.v1.patch
          10 kB
          Song Jiacheng
        2. image-2021-05-31-10-37-31-795.png
          13 kB
          Song Jiacheng
        3. image-2021-05-31-10-32-17-541.png
          19 kB
          Song Jiacheng

        Activity

          People

            Unassigned Unassigned
            Song Jiacheng Song Jiacheng
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated: